{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 数据框出报表：数据科学之分进合击分析策略\n",
    "![Data Science](https://www.kdnuggets.com/wp-content/uploads/data-science-process.jpg)\n",
    "上周主要内容：\n",
    "*  熊猫(pandas)简介：pandas模块之数据框DataFrame()基本方法和数据科学流程关系说明(框..变量..观察..系列)\n",
    "  * **将代码当成人类语言**用**片语化**记忆，并配合\n",
    "  * **将数据处理输入输出**用**视觉语言**记忆\n",
    "\n",
    "本周主要内容：\n",
    "* 数据科学之分进合击分析策略\n",
    "  * 什麽是split-combine-apply\n",
    "  * 这为什麽对\"出报表\"很重要？\n",
    "* 分进合击操演的心法及剑法学习\n",
    "  * 区分变量性质/数据型态 (variable types / data types)  \n",
    "  * 分进多为\"类别\"，可以是定类或定序\n",
    "  * 合击常为\"数量\"，可以是定距或定比\n",
    "* 操演的范围\n",
    "  * 以\"类别\"分进，以\"数量\"合击\n",
    "  * 2019胡润全球独角兽榜以数个\"类别\"分进，再以以\"数量\"合击\n",
    "\n",
    "-----\n",
    "\n",
    "本电子讲义为一系列课程的主要教材\n",
    "*  课程：20春_数据分析pandas （中山大学南方学院）\n",
    "*  设计者：廖汉腾, 许智超\n",
    "* 参考来源: [官方英文新手教程](https://pandas.pydata.org/pandas-docs/version/1.0.2/getting_started/index.html#getting-started)\n",
    "\n",
    "-----\n",
    "\n",
    "课堂教学方式：\n",
    "* 分段式以英文新手教程的内容做示范及说明\n",
    "* 课堂上以实际中文数据做操练，每段约10-15分钟\n",
    "* 抽学生联mic自播说明难点及成果点，教师总结"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style>\n",
       "/* 本电子讲义使用之CSS */\n",
       "div.code_cell {\n",
       "    background-color: #e5f1fe;\n",
       "}\n",
       "div.cell.selected {\n",
       "    background-color: #effee2;\n",
       "    font-size: 2rem;\n",
       "    line-height: 2.4rem;\n",
       "}\n",
       "div.cell.selected .rendered_html table {\n",
       "    font-size: 2rem !important;\n",
       "    line-height: 2.4rem !important;\n",
       "}\n",
       ".rendered_html pre code {\n",
       "    background-color: #C4E4ff;   \n",
       "    padding: 2px 25px;\n",
       "}\n",
       ".rendered_html pre {\n",
       "    background-color: #99c9ff;\n",
       "}\n",
       "div.code_cell .CodeMirror {\n",
       "    font-size: 2rem !important;\n",
       "    line-height: 2.4rem !important;\n",
       "}\n",
       ".rendered_html img, .rendered_html svg {\n",
       "    max-width: 60%;\n",
       "    height: auto;\n",
       "    float: right;\n",
       "}\n",
       "\n",
       ".rendered_html img[src*=\"#full\"], .rendered_html svg[src*=\"#full\"] {\n",
       "    max-width: 100%;\n",
       "    height: auto;\n",
       "    float: none;\n",
       "}\n",
       "\n",
       ".rendered_html img[src*=\"#thumbnail\"], .rendered_html svg[src*=\"#thumbnail\"] {\n",
       "    max-width: 15%;\n",
       "    height: auto;\n",
       "}\n",
       "\n",
       "/* Gradient transparent - color - transparent */\n",
       "hr {\n",
       "    border: 0;\n",
       "    border-bottom: 1px dashed #ccc;\n",
       "}\n",
       ".emoticon{\n",
       "    font-size: 5rem;\n",
       "    line-height: 4.4rem;\n",
       "    text-align: center;\n",
       "    vertical-align: middle;\n",
       "}\n",
       ".bg-split_apply_comine {\n",
       "    width: 500px;     \n",
       "    height: 300px;\n",
       "    background: url('02_split-apply-comine_500x300.png') -10px -10px;\n",
       "    float: right;\n",
       "}\n",
       ".bg-comine {\n",
       "    width: 175px;\n",
       "    height: 150px;\n",
       "    background: url('02_split-apply-comine_500x300.png') -280px -80px;\n",
       "    float: right;\n",
       "}\n",
       ".bg-apply {\n",
       "    width: 155px;\n",
       "    height: 225px;\n",
       "    background: url('02_split-apply-comine_500x300.png') -160px -30px;\n",
       "    float: right;\n",
       "}\n",
       ".bg-split {\n",
       "    width: 205px;\n",
       "    height: 225px;\n",
       "    background: url('02_split-apply-comine_500x300.png') -10px -30px;\n",
       "    float: right;\n",
       "}\n",
       ".break {\n",
       "                   page-break-after: right; \n",
       "                   width:700px;\n",
       "                   clear:both;\n",
       "}\n",
       "</style>\n"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "%%html\n",
    "<style>\n",
    "/* 本电子讲义使用之CSS */\n",
    "div.code_cell {\n",
    "    background-color: #e5f1fe;\n",
    "}\n",
    "div.cell.selected {\n",
    "    background-color: #effee2;\n",
    "    font-size: 2rem;\n",
    "    line-height: 2.4rem;\n",
    "}\n",
    "div.cell.selected .rendered_html table {\n",
    "    font-size: 2rem !important;\n",
    "    line-height: 2.4rem !important;\n",
    "}\n",
    ".rendered_html pre code {\n",
    "    background-color: #C4E4ff;   \n",
    "    padding: 2px 25px;\n",
    "}\n",
    ".rendered_html pre {\n",
    "    background-color: #99c9ff;\n",
    "}\n",
    "div.code_cell .CodeMirror {\n",
    "    font-size: 2rem !important;\n",
    "    line-height: 2.4rem !important;\n",
    "}\n",
    ".rendered_html img, .rendered_html svg {\n",
    "    max-width: 60%;\n",
    "    height: auto;\n",
    "    float: right;\n",
    "}\n",
    "\n",
    ".rendered_html img[src*=\"#full\"], .rendered_html svg[src*=\"#full\"] {\n",
    "    max-width: 100%;\n",
    "    height: auto;\n",
    "    float: none;\n",
    "}\n",
    "\n",
    ".rendered_html img[src*=\"#thumbnail\"], .rendered_html svg[src*=\"#thumbnail\"] {\n",
    "    max-width: 15%;\n",
    "    height: auto;\n",
    "}\n",
    "\n",
    "/* Gradient transparent - color - transparent */\n",
    "hr {\n",
    "    border: 0;\n",
    "    border-bottom: 1px dashed #ccc;\n",
    "}\n",
    ".emoticon{\n",
    "    font-size: 5rem;\n",
    "    line-height: 4.4rem;\n",
    "    text-align: center;\n",
    "    vertical-align: middle;\n",
    "}\n",
    ".bg-split_apply_comine {\n",
    "    width: 500px;     \n",
    "    height: 300px;\n",
    "    background: url('02_split-apply-comine_500x300.png') -10px -10px;\n",
    "    float: right;\n",
    "}\n",
    ".bg-comine {\n",
    "    width: 175px;\n",
    "    height: 150px;\n",
    "    background: url('02_split-apply-comine_500x300.png') -280px -80px;\n",
    "    float: right;\n",
    "}\n",
    ".bg-apply {\n",
    "    width: 155px;\n",
    "    height: 225px;\n",
    "    background: url('02_split-apply-comine_500x300.png') -160px -30px;\n",
    "    float: right;\n",
    "}\n",
    ".bg-split {\n",
    "    width: 205px;\n",
    "    height: 225px;\n",
    "    background: url('02_split-apply-comine_500x300.png') -10px -30px;\n",
    "    float: right;\n",
    "}\n",
    ".break {\n",
    "                   page-break-after: right; \n",
    "                   width:700px;\n",
    "                   clear:both;\n",
    "}\n",
    "</style>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 本周内容\n",
    "<div class=\"bg-split_apply_comine\"></div>\n",
    "\n",
    "本周内容共分5段\n",
    "1. 分进合击出报表\n",
    "2. 合合合：合击出报表常用计算的设计安排\n",
    "3. 进进进：进击出报表常用计算的选项\n",
    "4. 分分分：从df.groupby开展对知识领域丶统计丶及数据管理的数据形态及标准\n",
    "5. 数据感：从聚合到解聚的操作感\n",
    "\n",
    "\n",
    "\n",
    "左图（来源：[sunscrapers](https://sunscrapers.com/?utm_source=medium&utm_medium=article)）的最右边的积木飞机是不是很具设计感呢？\n",
    "\n",
    "是不是从最左边的来源，按照某种设计安排分类（中间看起来是按颜色）后，再组合起来成积木飞机。\n",
    "\n",
    "本周的学习旅程，一改传统编程从头说起的故事，让我们以倒敍的方式\n",
    "\n",
    "<br class=\"break\">\n",
    "<br class=\"break\">\n",
    "\n",
    "<div style=\"text-align:center; font-size:36px\">\n",
    "     \n",
    "展开我们有<mark style=\"text-align:center\">数据感</mark> 的 <mark style=\"text-align:center; font-size:36px\">从聚合到解聚</mark> 之旅\n",
    "\n",
    "</div>\n",
    "\n",
    "-----\n",
    "\n",
    "![06_groupby.svg](https://pandas.pydata.org/pandas-docs/version/1.0.2/_images/06_groupby.svg)\n",
    "\n",
    "* [分进合击出报表](#分进合击出报表)\n",
    "\n",
    "> 具体实操例子：问题 独角兽企业<mark>按国家和行业别</mark>有没有差别？此差别展现在(a)数量丶(b)估值及其分布丶(c)成立年份及其分布是否有差异？初阶使用<mark>Pandas groupby 出报表</mark>必会心法 \n",
    "\n",
    "<br class=\"break\">\n",
    "\n",
    "-----\n",
    "\n",
    "<div class=\"bg-comine\"></div>\n",
    "\n",
    "* [合合合](#合合合)：合击出报表常用计算的设计安排\n",
    "\n",
    "> <mark>合合合</mark>，作为数据科学家，我们先想像好胜利的合流数据队伍，先求设计安排好报表的样子。\n",
    "\n",
    "<br class=\"break\">\n",
    "\n",
    "-----\n",
    "\n",
    "<div class=\"bg-apply\"></div>\n",
    "\n",
    "* [进进进](#进进进)：进击出报表常用计算的选项\n",
    "\n",
    "> <mark>进进进</mark>，本来拿来全数据计算的，现在可以分头进行数据计算，有什麽常用的计算选项呢？他们对应到什麽样的数据类型？数据融合及拆解进击出报表常用计算的选项要上哪找？\n",
    "\n",
    "<br class=\"break\">\n",
    "\n",
    "-----\n",
    "\n",
    "<div class=\"bg-split\"></div>\n",
    "\n",
    "* [分分分](#分分分)：从df.groupby开展\n",
    "\n",
    "> <mark>分分分</mark>，接续上周的**切切切**切片 (英文叫slice)，groupby的分分分，是数据科学家将**切切切**的数据解剖刀，在找突破点的后，系统地把全数据拆分多块。<mark>大卸八块</mark>后好分迸合击。要如何分，不只是要会df.groupby的参数始使用，更是开展对知识领域丶统计丶及数据管理的数据形态及标准的数据感修练之旅\n",
    "\n",
    "<br class=\"break\">\n",
    "\n",
    "-----\n",
    "\n",
    "![02_split-apply-comine_detailed.png](02_split-apply-comine_500x300.png#full)\n",
    "\n",
    "* [数据感](#数据感)：从聚合到解聚的操作感\n",
    "\n",
    "> 数据之天下，天下之数据，合久必分分久必合，从聚合到解聚的操作感是需要我们能掌握的，让我们来总结一下今天学到的数据感，并标记着今日为\"数据感\"修练的里程碑\n",
    "\n",
    "\n",
    "<div class=\"emoticon\">😃😄😁</div>\n",
    "\n",
    "----- \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 分进合击出报表\n",
    "\n",
    "分进合击出报表具体实操例子：\n",
    "\n",
    "独角兽企业<mark>按国家和行业别</mark>有没有差别？此差别展现在(a)数量丶(b)估值及其分布丶(c)成立年份及其分布是否有差异？\n",
    "\n",
    "初阶使用<mark>Pandas groupby 出报表</mark>必会剑法心法 \n",
    "\n",
    "1. 分进合击之pandas剑法\n",
    "  * 分 groupby\n",
    "  * 迸 count, sum, mean, max, min\n",
    "  * 合 agg\n",
    "2. 分进合击之数据科学心法\n",
    "  * 分 split\n",
    "  * 迸 apply\n",
    "  * 合 combine\n",
    "3. 出报表剑法\n",
    "  * 一EXCEL档多分页法: \n",
    "     * with pd.ExcelWriter() as writer:\n",
    "     * with open() as fp:\n",
    "  * rename改名法\n",
    "     * 先练改columns名称(i.e. 变数名称)\n",
    "     * (以后)再练改index名称(i.e. 观察名称)\n",
    "  * sort_values 排序法\n",
    "     * 高阶多索引排序以后细练 \n",
    "     * 今日先比划比划先过\n",
    "     \n",
    "     \n",
    "以下以\n",
    "\n",
    "-----\n",
    "\n",
    "<div style=\"text-align:center\">\n",
    "     \n",
    "<mark style=\"text-align:center; font-size:36px\">倒敍</mark>的方式 \n",
    "\n",
    "</div>\n",
    "\n",
    "展开我们有从聚合到解聚 的 <mark>数据感</mark> 之旅\n",
    "\n",
    "----- "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 494 entries, 0 to 493\n",
      "Data columns (total 10 columns):\n",
      " #   Column        Non-Null Count  Dtype \n",
      "---  ------        --------------  ----- \n",
      " 0   排名            494 non-null    int64 \n",
      " 1   企业名称          494 non-null    object\n",
      " 2   Company Name  494 non-null    object\n",
      " 3   估值（亿人民币）      494 non-null    int64 \n",
      " 4   国家            494 non-null    object\n",
      " 5   城市            494 non-null    object\n",
      " 6   行业            494 non-null    object\n",
      " 7   掌门人/创始人       494 non-null    object\n",
      " 8   成立年份          494 non-null    int64 \n",
      " 9   部分投资机构        494 non-null    object\n",
      "dtypes: int64(3), object(7)\n",
      "memory usage: 38.7+ KB\n"
     ]
    }
   ],
   "source": [
    "# A0 简单读档并查看数据框讯息\n",
    "# 注意看Dtype! \n",
    "df = pd.read_csv (\"20春_pandas_week02_hurun_unicorn.tsv\", encoding = \"utf8\", sep=\"\\t\")\n",
    "df.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 挑战A1：如法泡制\n",
    "老板说，上次的EXCEL表做的不错，他还想多来2页：\n",
    "\n",
    "* 先国再城\n",
    "* 先行再城"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>企业名称</th>\n",
       "      <th colspan=\"2\" halign=\"left\">估值（亿人民币）</th>\n",
       "      <th colspan=\"2\" halign=\"left\">成立年份</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>数量</th>\n",
       "      <th>总和</th>\n",
       "      <th>均值</th>\n",
       "      <th>最新</th>\n",
       "      <th>最早</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>国家</th>\n",
       "      <th>行业</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">中国</th>\n",
       "      <th>金融科技</th>\n",
       "      <td>22</td>\n",
       "      <td>17960</td>\n",
       "      <td>816.363636</td>\n",
       "      <td>2018</td>\n",
       "      <td>2002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>媒体和娱乐</th>\n",
       "      <td>17</td>\n",
       "      <td>8230</td>\n",
       "      <td>484.117647</td>\n",
       "      <td>2015</td>\n",
       "      <td>2003</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"3\" valign=\"top\">美国</th>\n",
       "      <th>云计算</th>\n",
       "      <td>32</td>\n",
       "      <td>6880</td>\n",
       "      <td>215.000000</td>\n",
       "      <td>2015</td>\n",
       "      <td>2000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>共享经济</th>\n",
       "      <td>6</td>\n",
       "      <td>5670</td>\n",
       "      <td>945.000000</td>\n",
       "      <td>2017</td>\n",
       "      <td>2008</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>金融科技</th>\n",
       "      <td>21</td>\n",
       "      <td>5020</td>\n",
       "      <td>239.047619</td>\n",
       "      <td>2017</td>\n",
       "      <td>2000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>日本</th>\n",
       "      <th>区块链</th>\n",
       "      <td>1</td>\n",
       "      <td>70</td>\n",
       "      <td>70.000000</td>\n",
       "      <td>2014</td>\n",
       "      <td>2014</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">法国</th>\n",
       "      <th>人工智能</th>\n",
       "      <td>1</td>\n",
       "      <td>70</td>\n",
       "      <td>70.000000</td>\n",
       "      <td>2016</td>\n",
       "      <td>2016</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>媒体和娱乐</th>\n",
       "      <td>1</td>\n",
       "      <td>70</td>\n",
       "      <td>70.000000</td>\n",
       "      <td>2006</td>\n",
       "      <td>2006</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>爱沙尼亚</th>\n",
       "      <th>共享经济</th>\n",
       "      <td>1</td>\n",
       "      <td>70</td>\n",
       "      <td>70.000000</td>\n",
       "      <td>2013</td>\n",
       "      <td>2013</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>法国</th>\n",
       "      <th>健康科技</th>\n",
       "      <td>1</td>\n",
       "      <td>70</td>\n",
       "      <td>70.000000</td>\n",
       "      <td>2013</td>\n",
       "      <td>2013</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>103 rows × 5 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "           企业名称 估值（亿人民币）              成立年份      \n",
       "             数量       总和          均值    最新    最早\n",
       "国家   行业                                         \n",
       "中国   金融科技    22    17960  816.363636  2018  2002\n",
       "     媒体和娱乐   17     8230  484.117647  2015  2003\n",
       "美国   云计算     32     6880  215.000000  2015  2000\n",
       "     共享经济     6     5670  945.000000  2017  2008\n",
       "     金融科技    21     5020  239.047619  2017  2000\n",
       "...         ...      ...         ...   ...   ...\n",
       "日本   区块链      1       70   70.000000  2014  2014\n",
       "法国   人工智能     1       70   70.000000  2016  2016\n",
       "     媒体和娱乐    1       70   70.000000  2006  2006\n",
       "爱沙尼亚 共享经济     1       70   70.000000  2013  2013\n",
       "法国   健康科技     1       70   70.000000  2013  2013\n",
       "\n",
       "[103 rows x 5 columns]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>企业名称</th>\n",
       "      <th colspan=\"2\" halign=\"left\">估值（亿人民币）</th>\n",
       "      <th colspan=\"2\" halign=\"left\">成立年份</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>数量</th>\n",
       "      <th>总和</th>\n",
       "      <th>均值</th>\n",
       "      <th>最新</th>\n",
       "      <th>最早</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>行业</th>\n",
       "      <th>国家</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>金融科技</th>\n",
       "      <th>中国</th>\n",
       "      <td>22</td>\n",
       "      <td>17960</td>\n",
       "      <td>816.363636</td>\n",
       "      <td>2018</td>\n",
       "      <td>2002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>媒体和娱乐</th>\n",
       "      <th>中国</th>\n",
       "      <td>17</td>\n",
       "      <td>8230</td>\n",
       "      <td>484.117647</td>\n",
       "      <td>2015</td>\n",
       "      <td>2003</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>云计算</th>\n",
       "      <th>美国</th>\n",
       "      <td>32</td>\n",
       "      <td>6880</td>\n",
       "      <td>215.000000</td>\n",
       "      <td>2015</td>\n",
       "      <td>2000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>共享经济</th>\n",
       "      <th>美国</th>\n",
       "      <td>6</td>\n",
       "      <td>5670</td>\n",
       "      <td>945.000000</td>\n",
       "      <td>2017</td>\n",
       "      <td>2008</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>金融科技</th>\n",
       "      <th>美国</th>\n",
       "      <td>21</td>\n",
       "      <td>5020</td>\n",
       "      <td>239.047619</td>\n",
       "      <td>2017</td>\n",
       "      <td>2000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>房地产科技</th>\n",
       "      <th>菲律宾</th>\n",
       "      <td>1</td>\n",
       "      <td>70</td>\n",
       "      <td>70.000000</td>\n",
       "      <td>2015</td>\n",
       "      <td>2015</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>物流</th>\n",
       "      <th>哥伦比亚</th>\n",
       "      <td>1</td>\n",
       "      <td>70</td>\n",
       "      <td>70.000000</td>\n",
       "      <td>2016</td>\n",
       "      <td>2016</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>游戏</th>\n",
       "      <th>印度</th>\n",
       "      <td>1</td>\n",
       "      <td>70</td>\n",
       "      <td>70.000000</td>\n",
       "      <td>2012</td>\n",
       "      <td>2012</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>消费品</th>\n",
       "      <th>芬兰</th>\n",
       "      <td>1</td>\n",
       "      <td>70</td>\n",
       "      <td>70.000000</td>\n",
       "      <td>2016</td>\n",
       "      <td>2016</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>金融科技</th>\n",
       "      <th>韩国</th>\n",
       "      <td>1</td>\n",
       "      <td>70</td>\n",
       "      <td>70.000000</td>\n",
       "      <td>2011</td>\n",
       "      <td>2011</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>103 rows × 5 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "           企业名称 估值（亿人民币）              成立年份      \n",
       "             数量       总和          均值    最新    最早\n",
       "行业    国家                                        \n",
       "金融科技  中国     22    17960  816.363636  2018  2002\n",
       "媒体和娱乐 中国     17     8230  484.117647  2015  2003\n",
       "云计算   美国     32     6880  215.000000  2015  2000\n",
       "共享经济  美国      6     5670  945.000000  2017  2008\n",
       "金融科技  美国     21     5020  239.047619  2017  2000\n",
       "...         ...      ...         ...   ...   ...\n",
       "房地产科技 菲律宾     1       70   70.000000  2015  2015\n",
       "物流    哥伦比亚    1       70   70.000000  2016  2016\n",
       "游戏    印度      1       70   70.000000  2012  2012\n",
       "消费品   芬兰      1       70   70.000000  2016  2016\n",
       "金融科技  韩国      1       70   70.000000  2011  2011\n",
       "\n",
       "[103 rows x 5 columns]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# A1 原完整代码\n",
    "先国再行 = df.groupby ( by = ['国家','行业'] ) \\\n",
    "             .agg ({ \"企业名称\" : \"count\", \\\n",
    "                     \"估值（亿人民币）\":[\"sum\",\"mean\"], \\\n",
    "                     \"成立年份\":[\"max\",\"min\"],               }) \\\n",
    "             .sort_values ( by = [(\"估值（亿人民币）\",\"sum\")], ascending = False) \\\n",
    "             .rename ( columns = {\"sum\":\"总和\", \"mean\":\"均值\", \"count\":\"数量\", \"max\":\"最新\", \"min\":\"最早\"} )\n",
    "先行再国 = df.groupby ( by = ['行业', '国家'] ) \\\n",
    "             .agg ({ \"企业名称\" : \"count\", \\\n",
    "                     \"估值（亿人民币）\":[\"sum\",\"mean\"], \\\n",
    "                     \"成立年份\":[\"max\",\"min\"],               }) \\\n",
    "             .sort_values ( by = [(\"估值（亿人民币）\",\"sum\")], ascending = False) \\\n",
    "             .rename ( columns = {\"sum\":\"总和\", \"mean\":\"均值\", \"count\":\"数量\", \"max\":\"最新\", \"min\":\"最早\"} )\n",
    "display(先国再行)\n",
    "display(先行再国)\n",
    "\n",
    "with pd.ExcelWriter(\"20春_pandas_week03_hurun_unicorn.xlsx\") as writer:\n",
    "    先国再行.to_excel(writer,sheet_name=\"先国再行\") \n",
    "    先行再国.to_excel(writer,sheet_name=\"先行再国\") "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# A1-Extra 完整代码，多来2页：先国再城，先行再城\n",
    "\n",
    "# ( 來來來 )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 挑战A2：倒敍rename\n",
    "实习生\"吳名\"说，A1 完整代码的方法都看过，就是数据框最后一行/部分的 .rename()没看过。\n",
    "\n",
    "你要示范给她/她看，有rename 和 没rename的不同，以\"先国再行\"的结果为例\n",
    "\n",
    "原来是 df ----->  先国再行\n",
    "\n",
    "现在是 df ----->  先国再行_中继  ---.rename()-->  先国再行\n",
    "\n",
    "请跑出 \"先国再行_中继\" ，和 \"先国再行\" 用肉眼比较后，向实习生说明rename对於出报表的重要角色。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# A2-Extra 完整代码，多来中继，说明rename\n",
    "\n",
    "# ( 來來來 )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 挑战A3：倒敍sort_values\n",
    "实习生\"欒序\"说，A1 完整代码的方法都看过，就是数据框倒數第2行/部分的 .sort_values()没看过。\n",
    "\n",
    "你要示范给她/她看，有sort_values 和 没sort_values的不同，以\"先国再行\"的结果为例\n",
    "\n",
    "原来是 df ----->  先国再行\n",
    "\n",
    "现在是 df ----->  先国再行_中继A  ---.有sort_values()-->   先国再行_中继B  ---.rename()-->  先国再行\n",
    "\n",
    "请跑出 \"先国再行_中继A\" ，和 \"先国再行_中继B\" 用肉眼比较后，向实习生说明sort_values对於出报表的重要角色。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# A3-Extra 完整代码，多来中继，说明rename\n",
    "\n",
    "# ( 來來來 )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 挑战A4：agg 参数 的报表顺序\n",
    "老板说，上次的EXCEL表，客服说要调整一下顺序，比较专业。\n",
    "\n",
    "请你在已有的代码基础上，直接改agg 参数 来调整报表顺序。\n",
    "\n",
    "原来顺序是：\n",
    "* (    '企业名称', '数量')\n",
    "* ('估值（亿人民币）', '总和')\n",
    "* ('估值（亿人民币）', '均值')\n",
    "* (    '成立年份', '最新')\n",
    "* (    '成立年份', '最早')\n",
    "\n",
    "请改顺序为：\n",
    "* (    '成立年份', '最早')\n",
    "* (    '成立年份', '最新')\n",
    "* (    '企业名称', '数量')\n",
    "* ('估值（亿人民币）', '均值')\n",
    "* ('估值（亿人民币）', '总和')\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "# A4-Extra agg 参数 的报表顺序\n",
    "\n",
    "# ( 來來來 )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 小结\n",
    "* 以上代码操作，已经具体而微的把pandas的出报表的重要参数都练过了\n",
    "* 你能总结一下出报表3剑法？\n",
    "* 你能总结一下所有参数的各别意义吗？\n",
    "  * 分进合击之pandas剑法\n",
    "* 你能总结描述一下整个流程吗？\n",
    "  * 分进合击之数据科学心法\n",
    "  \n",
    "#### 小坑/小风格\n",
    "* 代码某几行最后一个字符有 \\，指的是什麽意思？\n",
    "* 代码某几行最后一个字符有 \\，为什麽要用？给机器还是人用的？\n",
    "* 代码某几行最后一个字符有 \\，若后面多了空白会怎麽样？"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"bg-split\"></div>\n",
    "\n",
    "## 分分分\n",
    "\n",
    "> <mark>分分分</mark>，接续上周的**切切切**切片 (英文叫slice)，groupby的分分分，是数据科学家将**切切切**的数据解剖刀，在找突破点的后，系统地把全数据拆分多块。<mark>大卸八块</mark>后好分迸合击。要如何分，不只是要会df.groupby的参数始使用，更是开展对知识领域丶统计丶及数据管理的数据形态及标准的数据感修练之旅\n",
    "\n",
    "-----\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 将对象分成组\n",
    "\n",
    "* 本节尝试完成：\n",
    "    * A1-Extra 完整代码，多来2页：先国再城，先行再城\n",
    "    * ( 來來來 )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>排名</th>\n",
       "      <th>企业名称</th>\n",
       "      <th>Company Name</th>\n",
       "      <th>估值（亿人民币）</th>\n",
       "      <th>国家</th>\n",
       "      <th>城市</th>\n",
       "      <th>行业</th>\n",
       "      <th>掌门人/创始人</th>\n",
       "      <th>成立年份</th>\n",
       "      <th>部分投资机构</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>蚂蚁金服</td>\n",
       "      <td>Ant Financial</td>\n",
       "      <td>10000</td>\n",
       "      <td>中国</td>\n",
       "      <td>杭州</td>\n",
       "      <td>金融科技</td>\n",
       "      <td>井贤栋</td>\n",
       "      <td>2014</td>\n",
       "      <td>春华资本、中投海外、红杉资本</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>字节跳动</td>\n",
       "      <td>Bytedance</td>\n",
       "      <td>5000</td>\n",
       "      <td>中国</td>\n",
       "      <td>北京</td>\n",
       "      <td>媒体和娱乐</td>\n",
       "      <td>张一鸣</td>\n",
       "      <td>2012</td>\n",
       "      <td>红杉资本、海纳亚洲、纪源资本、启明创投</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>滴滴出行</td>\n",
       "      <td>Didi Chuxing</td>\n",
       "      <td>3600</td>\n",
       "      <td>中国</td>\n",
       "      <td>北京</td>\n",
       "      <td>共享经济</td>\n",
       "      <td>程维</td>\n",
       "      <td>2012</td>\n",
       "      <td>腾讯、阿里巴巴、红杉资本、经纬中国、纪源资本</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>Infor</td>\n",
       "      <td>Infor</td>\n",
       "      <td>3500</td>\n",
       "      <td>美国</td>\n",
       "      <td>纽约</td>\n",
       "      <td>云计算</td>\n",
       "      <td>Jim Schaper</td>\n",
       "      <td>2002</td>\n",
       "      <td>Golden Gate Capital, Koch Equity Development</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>JUUL Labs</td>\n",
       "      <td>JUUL Labs</td>\n",
       "      <td>3400</td>\n",
       "      <td>美国</td>\n",
       "      <td>旧金山</td>\n",
       "      <td>消费品</td>\n",
       "      <td>Adam Bowen, James Monsees, Kevin Burns, Tim Da...</td>\n",
       "      <td>2015</td>\n",
       "      <td>M13, Timothy Davis, Evolution VC Partners, Tig...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   排名       企业名称   Company Name  估值（亿人民币）  国家   城市     行业  \\\n",
       "0   1       蚂蚁金服  Ant Financial     10000  中国   杭州   金融科技   \n",
       "1   2       字节跳动      Bytedance      5000  中国   北京  媒体和娱乐   \n",
       "2   3       滴滴出行   Didi Chuxing      3600  中国   北京   共享经济   \n",
       "3   4      Infor          Infor      3500  美国   纽约    云计算   \n",
       "4   5  JUUL Labs      JUUL Labs      3400  美国  旧金山    消费品   \n",
       "\n",
       "                                             掌门人/创始人  成立年份  \\\n",
       "0                                                井贤栋  2014   \n",
       "1                                                张一鸣  2012   \n",
       "2                                                 程维  2012   \n",
       "3                                        Jim Schaper  2002   \n",
       "4  Adam Bowen, James Monsees, Kevin Burns, Tim Da...  2015   \n",
       "\n",
       "                                              部分投资机构  \n",
       "0                                     春华资本、中投海外、红杉资本  \n",
       "1                                红杉资本、海纳亚洲、纪源资本、启明创投  \n",
       "2                             腾讯、阿里巴巴、红杉资本、经纬中国、纪源资本  \n",
       "3       Golden Gate Capital, Koch Equity Development  \n",
       "4  M13, Timothy Davis, Evolution VC Partners, Tig...  "
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000002814D4D2788>"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 传递给的字符串groupby可以引用列级别或索引级别\n",
    "按国家分 = df.groupby(\"国家\")\n",
    "按国家分"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>排名</th>\n",
       "      <th>企业名称</th>\n",
       "      <th>Company Name</th>\n",
       "      <th>估值（亿人民币）</th>\n",
       "      <th>城市</th>\n",
       "      <th>行业</th>\n",
       "      <th>掌门人/创始人</th>\n",
       "      <th>成立年份</th>\n",
       "      <th>部分投资机构</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>国家</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>中国</th>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>以色列</th>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>卢森堡</th>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>印度</th>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>印度尼西亚</th>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>哥伦比亚</th>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>巴西</th>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>德国</th>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>新加坡</th>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>日本</th>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>法国</th>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>澳大利亚</th>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>爱尔兰</th>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>爱沙尼亚</th>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>瑞典</th>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>瑞士</th>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>美国</th>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>芬兰</th>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>英国</th>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>菲律宾</th>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>西班牙</th>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>阿根廷</th>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>韩国</th>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>马耳他</th>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>object</td>\n",
       "      <td>int64</td>\n",
       "      <td>object</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          排名    企业名称 Company Name 估值（亿人民币）      城市      行业 掌门人/创始人   成立年份  \\\n",
       "国家                                                                          \n",
       "中国     int64  object       object    int64  object  object  object  int64   \n",
       "以色列    int64  object       object    int64  object  object  object  int64   \n",
       "卢森堡    int64  object       object    int64  object  object  object  int64   \n",
       "印度     int64  object       object    int64  object  object  object  int64   \n",
       "印度尼西亚  int64  object       object    int64  object  object  object  int64   \n",
       "哥伦比亚   int64  object       object    int64  object  object  object  int64   \n",
       "巴西     int64  object       object    int64  object  object  object  int64   \n",
       "德国     int64  object       object    int64  object  object  object  int64   \n",
       "新加坡    int64  object       object    int64  object  object  object  int64   \n",
       "日本     int64  object       object    int64  object  object  object  int64   \n",
       "法国     int64  object       object    int64  object  object  object  int64   \n",
       "澳大利亚   int64  object       object    int64  object  object  object  int64   \n",
       "爱尔兰    int64  object       object    int64  object  object  object  int64   \n",
       "爱沙尼亚   int64  object       object    int64  object  object  object  int64   \n",
       "瑞典     int64  object       object    int64  object  object  object  int64   \n",
       "瑞士     int64  object       object    int64  object  object  object  int64   \n",
       "美国     int64  object       object    int64  object  object  object  int64   \n",
       "芬兰     int64  object       object    int64  object  object  object  int64   \n",
       "英国     int64  object       object    int64  object  object  object  int64   \n",
       "菲律宾    int64  object       object    int64  object  object  object  int64   \n",
       "西班牙    int64  object       object    int64  object  object  object  int64   \n",
       "阿根廷    int64  object       object    int64  object  object  object  int64   \n",
       "韩国     int64  object       object    int64  object  object  object  int64   \n",
       "马耳他    int64  object       object    int64  object  object  object  int64   \n",
       "\n",
       "       部分投资机构  \n",
       "国家             \n",
       "中国     object  \n",
       "以色列    object  \n",
       "卢森堡    object  \n",
       "印度     object  \n",
       "印度尼西亚  object  \n",
       "哥伦比亚   object  \n",
       "巴西     object  \n",
       "德国     object  \n",
       "新加坡    object  \n",
       "日本     object  \n",
       "法国     object  \n",
       "澳大利亚   object  \n",
       "爱尔兰    object  \n",
       "爱沙尼亚   object  \n",
       "瑞典     object  \n",
       "瑞士     object  \n",
       "美国     object  \n",
       "芬兰     object  \n",
       "英国     object  \n",
       "菲律宾    object  \n",
       "西班牙    object  \n",
       "阿根廷    object  \n",
       "韩国     object  \n",
       "马耳他    object  "
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "按国家分.dtypes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000002814D6E7D88>"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "按行业分 = df.groupby(\"行业\")\n",
    "按行业分"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000002814D6E7688>"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 多个index\n",
    "按国家行业分 = df.groupby([\"国家\",\"行业\"])\n",
    "按国家行业分"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>排名</th>\n",
       "      <th>估值（亿人民币）</th>\n",
       "      <th>成立年份</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>国家</th>\n",
       "      <th>行业</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">中国</th>\n",
       "      <th>云计算</th>\n",
       "      <td>230.800000</td>\n",
       "      <td>92.000000</td>\n",
       "      <td>2012.400000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>人工智能</th>\n",
       "      <td>189.333333</td>\n",
       "      <td>139.333333</td>\n",
       "      <td>2013.466667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>健康科技</th>\n",
       "      <td>206.538462</td>\n",
       "      <td>158.461538</td>\n",
       "      <td>2011.384615</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>共享经济</th>\n",
       "      <td>148.750000</td>\n",
       "      <td>592.500000</td>\n",
       "      <td>2014.375000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>区块链</th>\n",
       "      <td>116.500000</td>\n",
       "      <td>312.500000</td>\n",
       "      <td>2014.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"4\" valign=\"top\">韩国</th>\n",
       "      <th>游戏</th>\n",
       "      <td>50.000000</td>\n",
       "      <td>350.000000</td>\n",
       "      <td>2007.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>物流</th>\n",
       "      <td>84.000000</td>\n",
       "      <td>200.000000</td>\n",
       "      <td>2011.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>电子商务</th>\n",
       "      <td>184.333333</td>\n",
       "      <td>246.666667</td>\n",
       "      <td>2008.333333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>金融科技</th>\n",
       "      <td>264.000000</td>\n",
       "      <td>70.000000</td>\n",
       "      <td>2011.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>马耳他</th>\n",
       "      <th>区块链</th>\n",
       "      <td>138.000000</td>\n",
       "      <td>150.000000</td>\n",
       "      <td>2017.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>103 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                  排名    估值（亿人民币）         成立年份\n",
       "国家  行业                                       \n",
       "中国  云计算   230.800000   92.000000  2012.400000\n",
       "    人工智能  189.333333  139.333333  2013.466667\n",
       "    健康科技  206.538462  158.461538  2011.384615\n",
       "    共享经济  148.750000  592.500000  2014.375000\n",
       "    区块链   116.500000  312.500000  2014.000000\n",
       "...              ...         ...          ...\n",
       "韩国  游戏     50.000000  350.000000  2007.000000\n",
       "    物流     84.000000  200.000000  2011.000000\n",
       "    电子商务  184.333333  246.666667  2008.333333\n",
       "    金融科技  264.000000   70.000000  2011.000000\n",
       "马耳他 区块链   138.000000  150.000000  2017.000000\n",
       "\n",
       "[103 rows x 3 columns]"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 观察，是不是自动忽略object 列？只保留了int列？思考什么样的数值可以被mean运算\n",
    "按国家行业分.mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### GroupBy对象属性\n",
    "\n",
    "该groups属性是一个dict，其键是计算出的唯一组，而对应的值是属于每个组的轴标签。在上面的示例中，我们有："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'中国': Int64Index([  0,   1,   2,   6,  10,  11,  12,  13,  14,  19,\n",
       "             ...\n",
       "             481, 482, 483, 484, 485, 486, 487, 488, 490, 491],\n",
       "            dtype='int64', length=206),\n",
       " '以色列': Int64Index([178, 184, 190, 310, 364, 384, 415], dtype='int64'),\n",
       " '卢森堡': Int64Index([342], dtype='int64'),\n",
       " '印度': Int64Index([ 23,  42,  45,  52,  81, 113, 123, 146, 163, 192, 206, 287, 323,\n",
       "             347, 361, 410, 422, 425, 432, 437, 465],\n",
       "            dtype='int64'),\n",
       " '印度尼西亚': Int64Index([22, 39, 74, 294], dtype='int64'),\n",
       " '哥伦比亚': Int64Index([427], dtype='int64'),\n",
       " '巴西': Int64Index([66, 344, 358, 389], dtype='int64'),\n",
       " '德国': Int64Index([56, 108, 160, 169, 269, 339, 411], dtype='int64'),\n",
       " '新加坡': Int64Index([15, 50], dtype='int64'),\n",
       " '日本': Int64Index([202, 388], dtype='int64'),\n",
       " '法国': Int64Index([149, 317, 320, 396], dtype='int64'),\n",
       " '澳大利亚': Int64Index([88], dtype='int64'),\n",
       " '爱尔兰': Int64Index([182], dtype='int64'),\n",
       " '爱沙尼亚': Int64Index([290], dtype='int64'),\n",
       " '瑞典': Int64Index([62, 196], dtype='int64'),\n",
       " '瑞士': Int64Index([36, 167, 400], dtype='int64'),\n",
       " '美国': Int64Index([  3,   4,   5,   7,   8,   9,  16,  17,  18,  21,\n",
       "             ...\n",
       "             463, 466, 469, 470, 471, 472, 474, 489, 492, 493],\n",
       "            dtype='int64', length=203),\n",
       " '芬兰': Int64Index([349], dtype='int64'),\n",
       " '英国': Int64Index([55, 73, 82, 107, 110, 145, 157, 164, 172, 177, 197, 207, 418], dtype='int64'),\n",
       " '菲律宾': Int64Index([431], dtype='int64'),\n",
       " '西班牙': Int64Index([297], dtype='int64'),\n",
       " '阿根廷': Int64Index([281], dtype='int64'),\n",
       " '韩国': Int64Index([26, 49, 129, 458, 459, 479], dtype='int64'),\n",
       " '马耳他': Int64Index([147], dtype='int64')}"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "按国家分.groups"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'3D印刷': Int64Index([155, 165, 335], dtype='int64'),\n",
       " '云计算': Int64Index([  3,  70,  88,  92, 119, 142, 170, 174, 178, 180, 182, 183, 190,\n",
       "             208, 209, 211, 212, 219, 251, 270, 272, 281, 282, 308, 310, 319,\n",
       "             324, 325, 329, 341, 357, 366, 367, 384, 397, 405, 416, 417, 446,\n",
       "             460, 464, 467, 470, 474],\n",
       "            dtype='int64'),\n",
       " '人工智能': Int64Index([ 33,  41,  46,  63,  84,  91, 102, 135, 138, 143, 145, 153, 161,\n",
       "             162, 172, 179, 200, 202, 216, 218, 241, 255, 267, 285, 296, 307,\n",
       "             315, 337, 396, 401, 402, 403, 414, 415, 436, 444, 450, 456, 463,\n",
       "             489],\n",
       "            dtype='int64'),\n",
       " '健康科技': Int64Index([ 27,  48,  77,  98, 112, 121, 166, 175, 210, 221, 234, 245, 277,\n",
       "             279, 295, 298, 305, 320, 326, 344, 345, 356, 380, 398, 412, 424,\n",
       "             454],\n",
       "            dtype='int64'),\n",
       " '共享经济': Int64Index([  2,   5,   8,  15,  22,  40,  45,  47,  52,  99, 127, 148, 149,\n",
       "             186, 224, 254, 290, 297, 378, 410, 438, 462],\n",
       "            dtype='int64'),\n",
       " '区块链': Int64Index([19, 29, 53, 87, 90, 147, 150, 167, 226, 289, 388], dtype='int64'),\n",
       " '即时通讯': Int64Index([168, 194, 347, 362, 448, 452], dtype='int64'),\n",
       " '大数据': Int64Index([ 17,  71,  94, 192, 232, 244, 250, 301, 309, 321, 338, 346, 352,\n",
       "             372, 385, 394, 451, 461],\n",
       "            dtype='int64'),\n",
       " '媒体和娱乐': Int64Index([  1,  13,  16,  85,  93,  95, 101, 117, 132, 151, 154, 204, 213,\n",
       "             214, 220, 235, 242, 260, 275, 317, 387, 392, 471, 482],\n",
       "            dtype='int64'),\n",
       " '房地产科技': Int64Index([24, 80, 104, 115, 225, 240, 259, 332, 419, 431, 443, 468, 472], dtype='int64'),\n",
       " '教育科技': Int64Index([42, 128, 134, 136, 229, 265, 271, 312, 313, 353, 354, 365, 377,\n",
       "             466, 490],\n",
       "            dtype='int64'),\n",
       " '新能源': Int64Index([196, 206, 302, 328, 399, 418, 423, 435, 440, 469], dtype='int64'),\n",
       " '新能源汽车': Int64Index([54, 59, 78, 79, 120, 133, 152, 158, 223, 227, 249, 292, 351, 381,\n",
       "             406],\n",
       "            dtype='int64'),\n",
       " '新零售': Int64Index([176, 189, 266, 276, 284, 287, 299, 300, 375, 407, 447], dtype='int64'),\n",
       " '机器人': Int64Index([14, 76, 109, 243], dtype='int64'),\n",
       " '消费品': Int64Index([4, 69, 195, 198, 205, 217, 257, 343, 349, 420, 442, 455], dtype='int64'),\n",
       " '游戏': Int64Index([49, 51, 118, 126, 140, 177, 230, 322, 323], dtype='int64'),\n",
       " '物流': Int64Index([ 11,  18,  20,  31,  43,  64,  81,  97, 105, 123, 129, 163, 164,\n",
       "             171, 181, 201, 222, 246, 248, 278, 311, 334, 358, 371, 374, 379,\n",
       "             389, 390, 427, 432, 481, 484, 488, 492],\n",
       "            dtype='int64'),\n",
       " '生命科学': Int64Index([ 21,  30,  36,  61,  68, 100, 124, 137, 160, 193, 197, 258, 264,\n",
       "             340, 355, 364, 408, 441],\n",
       "            dtype='int64'),\n",
       " '电子商务': Int64Index([ 25,  26,  28,  34,  39,  50,  55,  56,  57,  60,  67,  74,  75,\n",
       "             103, 106, 113, 122, 131, 169, 187, 215, 231, 236, 237, 238, 239,\n",
       "             247, 253, 256, 262, 269, 280, 286, 291, 294, 303, 304, 330, 331,\n",
       "             333, 339, 342, 348, 363, 368, 369, 370, 382, 395, 409, 411, 421,\n",
       "             425, 428, 430, 437, 445, 453, 457, 458, 465, 477, 479, 480, 483,\n",
       "             485, 491, 493],\n",
       "            dtype='int64'),\n",
       " '网络安全': Int64Index([38, 116, 306, 360, 376, 391, 413], dtype='int64'),\n",
       " '航天': Int64Index([7, 111, 433], dtype='int64'),\n",
       " '虚拟与增强现实': Int64Index([44, 65, 400], dtype='int64'),\n",
       " '软件与服务': Int64Index([130, 139, 141, 184, 203, 228, 233, 252, 261, 293, 314, 316, 336,\n",
       "             350, 359, 361, 393, 449, 476, 478, 487],\n",
       "            dtype='int64'),\n",
       " '金融科技': Int64Index([  0,   6,   9,  10,  12,  23,  32,  35,  37,  58,  62,  66,  72,\n",
       "              73,  82,  83,  86,  89,  96, 107, 108, 110, 114, 125, 144, 146,\n",
       "             156, 157, 159, 173, 185, 188, 191, 199, 207, 263, 268, 273, 274,\n",
       "             283, 288, 318, 327, 373, 383, 386, 404, 422, 426, 429, 434, 439,\n",
       "             459, 473, 475, 486],\n",
       "            dtype='int64')}"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "按行业分.groups"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{('中国', '云计算'): Int64Index([183, 251, 325, 464, 467], dtype='int64'),\n",
       " ('中国',\n",
       "  '人工智能'): Int64Index([46, 63, 91, 102, 153, 218, 241, 255, 267, 285, 337, 401, 402, 403,\n",
       "             414],\n",
       "            dtype='int64'),\n",
       " ('中国',\n",
       "  '健康科技'): Int64Index([27, 48, 77, 234, 245, 279, 305, 326, 345, 356, 380, 398, 454], dtype='int64'),\n",
       " ('中国',\n",
       "  '共享经济'): Int64Index([2, 47, 99, 127, 224, 254, 378, 438], dtype='int64'),\n",
       " ('中国', '区块链'): Int64Index([19, 87, 150, 226], dtype='int64'),\n",
       " ('中国',\n",
       "  '大数据'): Int64Index([232, 244, 250, 321, 338, 352, 372, 385, 451], dtype='int64'),\n",
       " ('中国',\n",
       "  '媒体和娱乐'): Int64Index([1, 13, 85, 93, 95, 101, 132, 154, 214, 220, 235, 242, 260, 275,\n",
       "             387, 392, 482],\n",
       "            dtype='int64'),\n",
       " ('中国', '房地产科技'): Int64Index([24, 80, 225, 240, 259, 332, 468], dtype='int64'),\n",
       " ('中国',\n",
       "  '教育科技'): Int64Index([128, 134, 136, 229, 265, 313, 353, 354, 365, 377, 490], dtype='int64'),\n",
       " ('中国', '新能源'): Int64Index([328, 423], dtype='int64'),\n",
       " ('中国',\n",
       "  '新能源汽车'): Int64Index([78, 79, 120, 133, 152, 158, 223, 227, 249, 292, 351, 381], dtype='int64'),\n",
       " ('中国', '新零售'): Int64Index([189, 266, 299, 407], dtype='int64'),\n",
       " ('中国', '机器人'): Int64Index([14, 76, 243], dtype='int64'),\n",
       " ('中国', '消费品'): Int64Index([69, 205, 257, 442], dtype='int64'),\n",
       " ('中国', '游戏'): Int64Index([230], dtype='int64'),\n",
       " ('中国',\n",
       "  '物流'): Int64Index([11, 20, 43, 64, 105, 181, 246, 248, 278, 334, 371, 379, 390, 481,\n",
       "             484, 488],\n",
       "            dtype='int64'),\n",
       " ('中国', '生命科学'): Int64Index([100, 258, 408, 441], dtype='int64'),\n",
       " ('中国',\n",
       "  '电子商务'): Int64Index([ 25,  34, 103, 106, 122, 131, 187, 215, 231, 236, 237, 238, 239,\n",
       "             247, 253, 256, 262, 286, 291, 303, 304, 333, 363, 368, 369, 370,\n",
       "             421, 428, 477, 480, 483, 485, 491],\n",
       "            dtype='int64'),\n",
       " ('中国', '网络安全'): Int64Index([116], dtype='int64'),\n",
       " ('中国',\n",
       "  '软件与服务'): Int64Index([130, 139, 141, 228, 233, 252, 261, 314, 336, 350, 359, 393, 476,\n",
       "             478, 487],\n",
       "            dtype='int64'),\n",
       " ('中国',\n",
       "  '金融科技'): Int64Index([  0,   6,  10,  12,  35,  37,  89,  96, 199, 263, 268, 273, 274,\n",
       "             318, 327, 383, 386, 429, 439, 473, 475, 486],\n",
       "            dtype='int64'),\n",
       " ('以色列', '云计算'): Int64Index([178, 190, 310, 384], dtype='int64'),\n",
       " ('以色列', '人工智能'): Int64Index([415], dtype='int64'),\n",
       " ('以色列', '生命科学'): Int64Index([364], dtype='int64'),\n",
       " ('以色列', '软件与服务'): Int64Index([184], dtype='int64'),\n",
       " ('卢森堡', '电子商务'): Int64Index([342], dtype='int64'),\n",
       " ('印度', '共享经济'): Int64Index([45, 52, 410], dtype='int64'),\n",
       " ('印度', '即时通讯'): Int64Index([347], dtype='int64'),\n",
       " ('印度', '大数据'): Int64Index([192], dtype='int64'),\n",
       " ('印度', '教育科技'): Int64Index([42], dtype='int64'),\n",
       " ('印度', '新能源'): Int64Index([206], dtype='int64'),\n",
       " ('印度', '新零售'): Int64Index([287], dtype='int64'),\n",
       " ('印度', '游戏'): Int64Index([323], dtype='int64'),\n",
       " ('印度', '物流'): Int64Index([81, 123, 163, 432], dtype='int64'),\n",
       " ('印度', '电子商务'): Int64Index([113, 425, 437, 465], dtype='int64'),\n",
       " ('印度', '软件与服务'): Int64Index([361], dtype='int64'),\n",
       " ('印度', '金融科技'): Int64Index([23, 146, 422], dtype='int64'),\n",
       " ('印度尼西亚', '共享经济'): Int64Index([22], dtype='int64'),\n",
       " ('印度尼西亚', '电子商务'): Int64Index([39, 74, 294], dtype='int64'),\n",
       " ('哥伦比亚', '物流'): Int64Index([427], dtype='int64'),\n",
       " ('巴西', '健康科技'): Int64Index([344], dtype='int64'),\n",
       " ('巴西', '物流'): Int64Index([358, 389], dtype='int64'),\n",
       " ('巴西', '金融科技'): Int64Index([66], dtype='int64'),\n",
       " ('德国', '生命科学'): Int64Index([160], dtype='int64'),\n",
       " ('德国', '电子商务'): Int64Index([56, 169, 269, 339, 411], dtype='int64'),\n",
       " ('德国', '金融科技'): Int64Index([108], dtype='int64'),\n",
       " ('新加坡', '共享经济'): Int64Index([15], dtype='int64'),\n",
       " ('新加坡', '电子商务'): Int64Index([50], dtype='int64'),\n",
       " ('日本', '人工智能'): Int64Index([202], dtype='int64'),\n",
       " ('日本', '区块链'): Int64Index([388], dtype='int64'),\n",
       " ('法国', '人工智能'): Int64Index([396], dtype='int64'),\n",
       " ('法国', '健康科技'): Int64Index([320], dtype='int64'),\n",
       " ('法国', '共享经济'): Int64Index([149], dtype='int64'),\n",
       " ('法国', '媒体和娱乐'): Int64Index([317], dtype='int64'),\n",
       " ('澳大利亚', '云计算'): Int64Index([88], dtype='int64'),\n",
       " ('爱尔兰', '云计算'): Int64Index([182], dtype='int64'),\n",
       " ('爱沙尼亚', '共享经济'): Int64Index([290], dtype='int64'),\n",
       " ('瑞典', '新能源'): Int64Index([196], dtype='int64'),\n",
       " ('瑞典', '金融科技'): Int64Index([62], dtype='int64'),\n",
       " ('瑞士', '区块链'): Int64Index([167], dtype='int64'),\n",
       " ('瑞士', '生命科学'): Int64Index([36], dtype='int64'),\n",
       " ('瑞士', '虚拟与增强现实'): Int64Index([400], dtype='int64'),\n",
       " ('美国', '3D印刷'): Int64Index([155, 165, 335], dtype='int64'),\n",
       " ('美国',\n",
       "  '云计算'): Int64Index([  3,  70,  92, 119, 142, 170, 174, 180, 208, 209, 211, 212, 219,\n",
       "             270, 272, 282, 308, 319, 324, 329, 341, 357, 366, 367, 397, 405,\n",
       "             416, 417, 446, 460, 470, 474],\n",
       "            dtype='int64'),\n",
       " ('美国',\n",
       "  '人工智能'): Int64Index([ 33,  41,  84, 135, 138, 143, 161, 162, 179, 200, 216, 296, 307,\n",
       "             315, 436, 444, 450, 456, 463, 489],\n",
       "            dtype='int64'),\n",
       " ('美国',\n",
       "  '健康科技'): Int64Index([98, 112, 121, 166, 175, 210, 221, 277, 295, 298, 412, 424], dtype='int64'),\n",
       " ('美国', '共享经济'): Int64Index([5, 8, 40, 148, 186, 462], dtype='int64'),\n",
       " ('美国', '区块链'): Int64Index([29, 53, 90, 289], dtype='int64'),\n",
       " ('美国', '即时通讯'): Int64Index([168, 194, 362, 448, 452], dtype='int64'),\n",
       " ('美国',\n",
       "  '大数据'): Int64Index([17, 71, 94, 301, 309, 346, 394, 461], dtype='int64'),\n",
       " ('美国', '媒体和娱乐'): Int64Index([16, 117, 151, 204, 213, 471], dtype='int64'),\n",
       " ('美国', '房地产科技'): Int64Index([104, 115, 419, 443, 472], dtype='int64'),\n",
       " ('美国', '教育科技'): Int64Index([271, 312, 466], dtype='int64'),\n",
       " ('美国', '新能源'): Int64Index([302, 399, 435, 440, 469], dtype='int64'),\n",
       " ('美国', '新能源汽车'): Int64Index([54, 59, 406], dtype='int64'),\n",
       " ('美国', '新零售'): Int64Index([176, 276, 284, 300, 375, 447], dtype='int64'),\n",
       " ('美国', '机器人'): Int64Index([109], dtype='int64'),\n",
       " ('美国', '消费品'): Int64Index([4, 195, 198, 217, 343, 420, 455], dtype='int64'),\n",
       " ('美国', '游戏'): Int64Index([51, 118, 126, 140, 322], dtype='int64'),\n",
       " ('美国',\n",
       "  '物流'): Int64Index([18, 31, 97, 171, 201, 222, 311, 374, 492], dtype='int64'),\n",
       " ('美国',\n",
       "  '生命科学'): Int64Index([21, 30, 61, 68, 124, 137, 193, 264, 340, 355], dtype='int64'),\n",
       " ('美国',\n",
       "  '电子商务'): Int64Index([28, 57, 60, 67, 75, 280, 330, 331, 348, 382, 395, 409, 430, 445,\n",
       "             453, 457, 493],\n",
       "            dtype='int64'),\n",
       " ('美国', '网络安全'): Int64Index([38, 306, 360, 376, 391, 413], dtype='int64'),\n",
       " ('美国', '航天'): Int64Index([7, 111, 433], dtype='int64'),\n",
       " ('美国', '虚拟与增强现实'): Int64Index([44, 65], dtype='int64'),\n",
       " ('美国', '软件与服务'): Int64Index([203, 293, 316, 449], dtype='int64'),\n",
       " ('美国',\n",
       "  '金融科技'): Int64Index([  9,  32,  58,  72,  83,  86, 114, 125, 144, 156, 159, 173, 185,\n",
       "             188, 191, 283, 288, 373, 404, 426, 434],\n",
       "            dtype='int64'),\n",
       " ('芬兰', '消费品'): Int64Index([349], dtype='int64'),\n",
       " ('英国', '人工智能'): Int64Index([145, 172], dtype='int64'),\n",
       " ('英国', '新能源'): Int64Index([418], dtype='int64'),\n",
       " ('英国', '游戏'): Int64Index([177], dtype='int64'),\n",
       " ('英国', '物流'): Int64Index([164], dtype='int64'),\n",
       " ('英国', '生命科学'): Int64Index([197], dtype='int64'),\n",
       " ('英国', '电子商务'): Int64Index([55], dtype='int64'),\n",
       " ('英国', '金融科技'): Int64Index([73, 82, 107, 110, 157, 207], dtype='int64'),\n",
       " ('菲律宾', '房地产科技'): Int64Index([431], dtype='int64'),\n",
       " ('西班牙', '共享经济'): Int64Index([297], dtype='int64'),\n",
       " ('阿根廷', '云计算'): Int64Index([281], dtype='int64'),\n",
       " ('韩国', '游戏'): Int64Index([49], dtype='int64'),\n",
       " ('韩国', '物流'): Int64Index([129], dtype='int64'),\n",
       " ('韩国', '电子商务'): Int64Index([26, 458, 479], dtype='int64'),\n",
       " ('韩国', '金融科技'): Int64Index([459], dtype='int64'),\n",
       " ('马耳他', '区块链'): Int64Index([147], dtype='int64')}"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 注意看，多个级别时，为元组\n",
    "按国家行业分.groups"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"bg-apply\"></div>\n",
    "\n",
    "## 进进进\n",
    "\n",
    "> <mark>进进进</mark>，本来拿来全数据计算的，现在可以分头进行数据计算，有什麽常用的计算选项呢？他们对应到什麽样的数据类型？数据融合及拆解进击出报表常用计算的选项要上哪找？\n",
    "\n",
    "进击出报表常用计算的选项：\n",
    "\n",
    "* [Tableau 中的数据聚合]选项(https://help.tableau.com/current/pro/desktop/zh-cn/calculations_aggregation.htm)\n",
    "* [Python 中的数据聚合]选项(https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.aggregate.html) 只要是function都能用\n",
    "* [如何在 Tableau 中利用 Python 的力量？](https://ask.hellobi.com/blog/gaokuoup/8430)\n",
    "* [黑pandas]?(https://zhuanlan.zhihu.com/p/88143921)\n",
    "-----\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 计算选项\n",
    "使用df.info()检查, 看看Dtype非object有哪些"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 494 entries, 0 to 493\n",
      "Data columns (total 10 columns):\n",
      " #   Column        Non-Null Count  Dtype \n",
      "---  ------        --------------  ----- \n",
      " 0   排名            494 non-null    int64 \n",
      " 1   企业名称          494 non-null    object\n",
      " 2   Company Name  494 non-null    object\n",
      " 3   估值（亿人民币）      494 non-null    int64 \n",
      " 4   国家            494 non-null    object\n",
      " 5   城市            494 non-null    object\n",
      " 6   行业            494 non-null    object\n",
      " 7   掌门人/创始人       494 non-null    object\n",
      " 8   成立年份          494 non-null    int64 \n",
      " 9   部分投资机构        494 non-null    object\n",
      "dtypes: int64(3), object(7)\n",
      "memory usage: 38.7+ KB\n"
     ]
    }
   ],
   "source": [
    "# C1\n",
    "#df.head()\n",
    "df.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "117970\n",
      "117970\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>估值（亿人民币）</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>城市</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>-</th>\n",
       "      <td>370</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Burlington Massachussets</th>\n",
       "      <td>150</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Emerville</th>\n",
       "      <td>500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Foster City</th>\n",
       "      <td>200</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Guilford</th>\n",
       "      <td>70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>青岛</th>\n",
       "      <td>100</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>首尔</th>\n",
       "      <td>1010</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>香港</th>\n",
       "      <td>460</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>马卡迪</th>\n",
       "      <td>70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>马德里</th>\n",
       "      <td>70</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>120 rows × 1 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                          估值（亿人民币）\n",
       "城市                                \n",
       "-                              370\n",
       "Burlington Massachussets       150\n",
       "Emerville                      500\n",
       "Foster City                    200\n",
       "Guilford                        70\n",
       "...                            ...\n",
       "青岛                             100\n",
       "首尔                            1010\n",
       "香港                             460\n",
       "马卡迪                             70\n",
       "马德里                             70\n",
       "\n",
       "[120 rows x 1 columns]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>估值（亿人民币）</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>城市</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>-</th>\n",
       "      <td>370</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Burlington Massachussets</th>\n",
       "      <td>150</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Emerville</th>\n",
       "      <td>500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Foster City</th>\n",
       "      <td>200</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Guilford</th>\n",
       "      <td>70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>青岛</th>\n",
       "      <td>100</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>首尔</th>\n",
       "      <td>1010</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>香港</th>\n",
       "      <td>460</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>马卡迪</th>\n",
       "      <td>70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>马德里</th>\n",
       "      <td>70</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>120 rows × 1 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                          估值（亿人民币）\n",
       "城市                                \n",
       "-                              370\n",
       "Burlington Massachussets       150\n",
       "Emerville                      500\n",
       "Foster City                    200\n",
       "Guilford                        70\n",
       "...                            ...\n",
       "青岛                             100\n",
       "首尔                            1010\n",
       "香港                             460\n",
       "马卡迪                             70\n",
       "马德里                             70\n",
       "\n",
       "[120 rows x 1 columns]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# C2\n",
    "# 算某一项的加总值\n",
    "print ( df[\"估值（亿人民币）\"].agg(\"sum\") )\n",
    "print ( df[\"估值（亿人民币）\"].sum() )\n",
    "\n",
    "\n",
    "from IPython.display import display, HTML\n",
    "# 從 IPython.display 模塊 導入使用 display和HTML\n",
    "\n",
    "display  ( df[[\"城市\",\"估值（亿人民币）\"]].groupby(by=\"城市\").sum() )\n",
    "display  ( df[[\"城市\",\"估值（亿人民币）\"]].groupby(by=\"城市\").agg(\"sum\") )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "max    2019\n",
       "min    2000\n",
       "Name: 成立年份, dtype: int64"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# C2 Extra\n",
    "# 你拿\"成立年份\"试试? \n",
    "\n",
    "df[\"成立年份\"].agg([\"max\",\"min\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"bg-comine\"></div>\n",
    "\n",
    "## 合合合\n",
    "> <mark>合合合</mark>，作为数据科学家，我们先想像好胜利的合流数据队伍，先求设计安排好报表的样子。\n",
    "\n",
    "合击出报表常用计算的设计安排，在以上的示范代码及操练上，用的是.agg的参数\n",
    "\n",
    "-----\n",
    "\n",
    "\n",
    "(想像)出报表: 算数(量丶估值丶分布)\n",
    "\n",
    "\n",
    "*** 尝试完成挑战A4 ***"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 合并agg运算\n",
    "* 在2.3中，可能需要进行合并：\n",
    "    * 1、计算groupby分组好的 多个 函数值（sum,count）  按国家行业分:([\"国家\",\"行业\"])\n",
    "    * 2、计算多个不同列的多个 函数值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>sum</th>\n",
       "      <th>count</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>国家</th>\n",
       "      <th>行业</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">中国</th>\n",
       "      <th>云计算</th>\n",
       "      <td>460</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>人工智能</th>\n",
       "      <td>2090</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>健康科技</th>\n",
       "      <td>2060</td>\n",
       "      <td>13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>共享经济</th>\n",
       "      <td>4740</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>区块链</th>\n",
       "      <td>1250</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"4\" valign=\"top\">韩国</th>\n",
       "      <th>游戏</th>\n",
       "      <td>350</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>物流</th>\n",
       "      <td>200</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>电子商务</th>\n",
       "      <td>740</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>金融科技</th>\n",
       "      <td>70</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>马耳他</th>\n",
       "      <th>区块链</th>\n",
       "      <td>150</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>103 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "           sum  count\n",
       "国家  行业               \n",
       "中国  云计算    460      5\n",
       "    人工智能  2090     15\n",
       "    健康科技  2060     13\n",
       "    共享经济  4740      8\n",
       "    区块链   1250      4\n",
       "...        ...    ...\n",
       "韩国  游戏     350      1\n",
       "    物流     200      1\n",
       "    电子商务   740      3\n",
       "    金融科技    70      1\n",
       "马耳他 区块链    150      1\n",
       "\n",
       "[103 rows x 2 columns]"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 计算groupby分组好的 多个 函数值（sum,count）  按国家行业分:([\"国家\",\"行业\"])\n",
    "按国家行业分[\"估值（亿人民币）\"].agg([\"sum\",\"count\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>估值（亿人民币）</th>\n",
       "      <th>成立年份</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>国家</th>\n",
       "      <th>行业</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">中国</th>\n",
       "      <th>云计算</th>\n",
       "      <td>150</td>\n",
       "      <td>2015</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>人工智能</th>\n",
       "      <td>400</td>\n",
       "      <td>2016</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>健康科技</th>\n",
       "      <td>600</td>\n",
       "      <td>2019</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>共享经济</th>\n",
       "      <td>3600</td>\n",
       "      <td>2016</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>区块链</th>\n",
       "      <td>800</td>\n",
       "      <td>2017</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"4\" valign=\"top\">韩国</th>\n",
       "      <th>游戏</th>\n",
       "      <td>350</td>\n",
       "      <td>2007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>物流</th>\n",
       "      <td>200</td>\n",
       "      <td>2011</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>电子商务</th>\n",
       "      <td>600</td>\n",
       "      <td>2010</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>金融科技</th>\n",
       "      <td>70</td>\n",
       "      <td>2011</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>马耳他</th>\n",
       "      <th>区块链</th>\n",
       "      <td>150</td>\n",
       "      <td>2017</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>103 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "          估值（亿人民币）  成立年份\n",
       "国家  行业                  \n",
       "中国  云计算        150  2015\n",
       "    人工智能       400  2016\n",
       "    健康科技       600  2019\n",
       "    共享经济      3600  2016\n",
       "    区块链        800  2017\n",
       "...            ...   ...\n",
       "韩国  游戏         350  2007\n",
       "    物流         200  2011\n",
       "    电子商务       600  2010\n",
       "    金融科技        70  2011\n",
       "马耳他 区块链        150  2017\n",
       "\n",
       "[103 rows x 2 columns]"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 计算多个不同列的多个 函数值 按国家行业分:([\"国家\",\"行业\"])\n",
    "按国家行业分[[\"估值（亿人民币）\",\"成立年份\"]].agg(\"max\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  agg 的几种方式（多指标统计的方法）\n",
    "```python   \n",
    "\n",
    "df.groupby([\"国家\",\"行业\"])[\"估值（亿人民币）\"].agg(sum = \"sum\",mean = \"mean\",count = \"count\")\n",
    "df.groupby([\"国家\",\"行业\"])[\"估值（亿人民币）\"].agg([\"sum\",\"mean\",\"max\",\"min\"])\n",
    "df.groupby([\"国家\",\"行业\"]).agg({\"估值（亿人民币）\":[\"sum\",\"mean\",\"count\"]})\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>sum</th>\n",
       "      <th>mean</th>\n",
       "      <th>count</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>国家</th>\n",
       "      <th>行业</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">中国</th>\n",
       "      <th>云计算</th>\n",
       "      <td>460</td>\n",
       "      <td>92.000000</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>人工智能</th>\n",
       "      <td>2090</td>\n",
       "      <td>139.333333</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>健康科技</th>\n",
       "      <td>2060</td>\n",
       "      <td>158.461538</td>\n",
       "      <td>13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>共享经济</th>\n",
       "      <td>4740</td>\n",
       "      <td>592.500000</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>区块链</th>\n",
       "      <td>1250</td>\n",
       "      <td>312.500000</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"4\" valign=\"top\">韩国</th>\n",
       "      <th>游戏</th>\n",
       "      <td>350</td>\n",
       "      <td>350.000000</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>物流</th>\n",
       "      <td>200</td>\n",
       "      <td>200.000000</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>电子商务</th>\n",
       "      <td>740</td>\n",
       "      <td>246.666667</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>金融科技</th>\n",
       "      <td>70</td>\n",
       "      <td>70.000000</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>马耳他</th>\n",
       "      <th>区块链</th>\n",
       "      <td>150</td>\n",
       "      <td>150.000000</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>103 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "           sum        mean  count\n",
       "国家  行业                           \n",
       "中国  云计算    460   92.000000      5\n",
       "    人工智能  2090  139.333333     15\n",
       "    健康科技  2060  158.461538     13\n",
       "    共享经济  4740  592.500000      8\n",
       "    区块链   1250  312.500000      4\n",
       "...        ...         ...    ...\n",
       "韩国  游戏     350  350.000000      1\n",
       "    物流     200  200.000000      1\n",
       "    电子商务   740  246.666667      3\n",
       "    金融科技    70   70.000000      1\n",
       "马耳他 区块链    150  150.000000      1\n",
       "\n",
       "[103 rows x 3 columns]"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby([\"国家\",\"行业\"])[\"估值（亿人民币）\"].agg(sum = \"sum\",mean = \"mean\",count = \"count\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>sum</th>\n",
       "      <th>mean</th>\n",
       "      <th>max</th>\n",
       "      <th>min</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>国家</th>\n",
       "      <th>行业</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">中国</th>\n",
       "      <th>云计算</th>\n",
       "      <td>460</td>\n",
       "      <td>92.000000</td>\n",
       "      <td>150</td>\n",
       "      <td>70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>人工智能</th>\n",
       "      <td>2090</td>\n",
       "      <td>139.333333</td>\n",
       "      <td>400</td>\n",
       "      <td>70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>健康科技</th>\n",
       "      <td>2060</td>\n",
       "      <td>158.461538</td>\n",
       "      <td>600</td>\n",
       "      <td>70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>共享经济</th>\n",
       "      <td>4740</td>\n",
       "      <td>592.500000</td>\n",
       "      <td>3600</td>\n",
       "      <td>70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>区块链</th>\n",
       "      <td>1250</td>\n",
       "      <td>312.500000</td>\n",
       "      <td>800</td>\n",
       "      <td>100</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"4\" valign=\"top\">韩国</th>\n",
       "      <th>游戏</th>\n",
       "      <td>350</td>\n",
       "      <td>350.000000</td>\n",
       "      <td>350</td>\n",
       "      <td>350</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>物流</th>\n",
       "      <td>200</td>\n",
       "      <td>200.000000</td>\n",
       "      <td>200</td>\n",
       "      <td>200</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>电子商务</th>\n",
       "      <td>740</td>\n",
       "      <td>246.666667</td>\n",
       "      <td>600</td>\n",
       "      <td>70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>金融科技</th>\n",
       "      <td>70</td>\n",
       "      <td>70.000000</td>\n",
       "      <td>70</td>\n",
       "      <td>70</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>马耳他</th>\n",
       "      <th>区块链</th>\n",
       "      <td>150</td>\n",
       "      <td>150.000000</td>\n",
       "      <td>150</td>\n",
       "      <td>150</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>103 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "           sum        mean   max  min\n",
       "国家  行业                               \n",
       "中国  云计算    460   92.000000   150   70\n",
       "    人工智能  2090  139.333333   400   70\n",
       "    健康科技  2060  158.461538   600   70\n",
       "    共享经济  4740  592.500000  3600   70\n",
       "    区块链   1250  312.500000   800  100\n",
       "...        ...         ...   ...  ...\n",
       "韩国  游戏     350  350.000000   350  350\n",
       "    物流     200  200.000000   200  200\n",
       "    电子商务   740  246.666667   600   70\n",
       "    金融科技    70   70.000000    70   70\n",
       "马耳他 区块链    150  150.000000   150  150\n",
       "\n",
       "[103 rows x 4 columns]"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby([\"国家\",\"行业\"])[\"估值（亿人民币）\"].agg([\"sum\",\"mean\",\"max\",\"min\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th colspan=\"3\" halign=\"left\">估值（亿人民币）</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>sum</th>\n",
       "      <th>mean</th>\n",
       "      <th>count</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>国家</th>\n",
       "      <th>行业</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">中国</th>\n",
       "      <th>云计算</th>\n",
       "      <td>460</td>\n",
       "      <td>92.000000</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>人工智能</th>\n",
       "      <td>2090</td>\n",
       "      <td>139.333333</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>健康科技</th>\n",
       "      <td>2060</td>\n",
       "      <td>158.461538</td>\n",
       "      <td>13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>共享经济</th>\n",
       "      <td>4740</td>\n",
       "      <td>592.500000</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>区块链</th>\n",
       "      <td>1250</td>\n",
       "      <td>312.500000</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"4\" valign=\"top\">韩国</th>\n",
       "      <th>游戏</th>\n",
       "      <td>350</td>\n",
       "      <td>350.000000</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>物流</th>\n",
       "      <td>200</td>\n",
       "      <td>200.000000</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>电子商务</th>\n",
       "      <td>740</td>\n",
       "      <td>246.666667</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>金融科技</th>\n",
       "      <td>70</td>\n",
       "      <td>70.000000</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>马耳他</th>\n",
       "      <th>区块链</th>\n",
       "      <td>150</td>\n",
       "      <td>150.000000</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>103 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "         估值（亿人民币）                  \n",
       "              sum        mean count\n",
       "国家  行业                             \n",
       "中国  云计算       460   92.000000     5\n",
       "    人工智能     2090  139.333333    15\n",
       "    健康科技     2060  158.461538    13\n",
       "    共享经济     4740  592.500000     8\n",
       "    区块链      1250  312.500000     4\n",
       "...           ...         ...   ...\n",
       "韩国  游戏        350  350.000000     1\n",
       "    物流        200  200.000000     1\n",
       "    电子商务      740  246.666667     3\n",
       "    金融科技       70   70.000000     1\n",
       "马耳他 区块链       150  150.000000     1\n",
       "\n",
       "[103 rows x 3 columns]"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.groupby([\"国家\",\"行业\"]).agg({\"估值（亿人民币）\":[\"sum\",\"mean\",\"count\"]})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 数据感\n",
    "\n",
    "![02_split-apply-comine_detailed.png](02_split-apply-comine_detailed.png#full)\n",
    "\n",
    "<br class=\"break\">\n",
    "-----\n",
    "\n",
    "> 数据之天下，天下之数据，合久必分分久必合，从聚合到解聚的操作感是需要我们能掌握的，让我们来总结一下今天学到的数据感，并标记着今日为\"数据感\"修练的里程碑\n",
    "\n",
    "从聚合到解聚的操作感\n",
    "\n",
    "* 国丶城丶行 是 用於 groupby，可能会各观察值会有重覆，才有分组的可能\n",
    "* 企业名称  是 用於 算数量的，其假定是一个独角兽对映一个企业名称\n",
    "* 估值丶年 是 用於 算统计数量的，是agg合流的对象\n",
    "\n",
    "### pandas 的 类别数据\n",
    "pandas 的[类别数据（categorical）](https://www.jianshu.com/p/20169d7f60bc)可以建，使变数不再只是object\n",
    "* [User Guide: Categorical data](https://pandas.pydata.org/pandas-docs/stable/user_guide/categorical.html)\n",
    "* [pandas.CategoricalDtype](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.CategoricalDtype.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0      中国\n",
      "1      中国\n",
      "2      中国\n",
      "3      美国\n",
      "4      美国\n",
      "       ..\n",
      "489    美国\n",
      "490    中国\n",
      "491    中国\n",
      "492    美国\n",
      "493    美国\n",
      "Name: 国家, Length: 494, dtype: object\n",
      "0      中国\n",
      "1      中国\n",
      "2      中国\n",
      "3      美国\n",
      "4      美国\n",
      "       ..\n",
      "489    美国\n",
      "490    中国\n",
      "491    中国\n",
      "492    美国\n",
      "493    美国\n",
      "Name: 国家, Length: 494, dtype: category\n",
      "Categories (24, object): [中国, 以色列, 卢森堡, 印度, ..., 西班牙, 阿根廷, 韩国, 马耳他]\n"
     ]
    }
   ],
   "source": [
    "# E1 Categorical 类型的数据\n",
    "print (df.国家)\n",
    "print (df.国家.astype('category'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "中国         AxesSubplot(0.1,0.15;0.363636x0.75)\n",
       "美国    AxesSubplot(0.536364,0.15;0.363636x0.75)\n",
       "dtype: object"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYcAAAEFCAYAAAAIZiutAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAaT0lEQVR4nO3df3Qd9Xnn8fdjWZZlOSWkjpWaNsjUblaK+GGiNLiorG7cNFVTCHBIgkzZ3UrBMSco3Z6QNUFkiVPuAS+B5MRtcMRekyzFSoEtoq5xDzVIFO/is2t3aeJjsbi7tpt6QxvHxEQ2lmzx7B93JF9pJPvq/vBoRp/XOTq+8525M8+Vn6tnvvOdH+buiIiI5JoTdQAiIjLzqDiIiEiIioOIiISoOIiISIiKg4iIhKg4xJiZPWZmfxR1HCLlZGYfMbOP5EyvMrPWKGOaDeZGHYAU5XTwE2JmfweMAENTvLcKwN0/XJ7QRIpjZnMBA44C3zGzVcGsPwZuNzMD5rj7SLC8cr6EVBzi7RRTFIdg3meAHwPDnnNBi5lVAb8EbCl7hCKF+z3gbmA4mH4NmA/8CPhTsoXjUeC7wXzlfAmpOMSImf1n4GPAT4Km9wOfMLOOYHoJkHH3r5D9ogA8ASwxs18GTgBHyH7BbgLeOV+xi0yXu/ea2U+AC8jm6keBWqCHbGE45e47ct6inC8hFYd4GQbud/dNAGb2J8DenOmvEk7+NrJd7f9Ids/rSaCS7F6USBzYNNuV8yWg4hAv+ez1TFxmPdACjO5FfQF4HfhqKQMTKTUz2w0cB0YPD72H7LjBxcH0HDN7EGhx95/lvFU5XwIqDvEyB/iymX02mH4/cG3O9BLgkdw3uPvdMNareA1IAfcBFecjYJFCuXtT7rSZ3QH81N17zvE+5XwJqDjESyXnPqxUmbO8mdk8dx/OadsO/ArwxvkJWaQwZnYf8HGye/8OLAfeNrPPBYvMBV5z98+Of5tyvhRUHOLlc5zpYk/mazmvjWy3+kkzGz2177eCf6uCZac6ZisSOXe/B7gHwMw+QHYg+n8D/9Xdn57kLcr5ElJxiBF3nzieMIecCxknzK8EfjTVOd1mVof+/2WGM7NqYC2wOvj5P8ATZnYD8E1gd84pq8r5EtIvKt5qgHlTzJsLPJWzBzVRFbpCXmYwM3sUaAUeBz6WM+j8aTO7EegGLjazFe5+COV8SZke9pNMwR7XSdd/sMSUmS0Gjrr7VBd6Yma/6O4/DV4r50tIxUFERELUxRIRkRAVBxERCZnRA9KLFi3yurq6qMOIpePHj1NTUxN1GLGzZ8+eI+7+3qi2r5wvjPK9cFPl/IwuDnV1dezevTvqMGKpv7+flpaWqMOIHTM7FOX2lfOFUb4Xbqqc12ElEREJUXEQEZEQFQcREQlRcRARkRAVBxERCVFxSJienh4aGxtZtWoVjY2N9PSc9db3IiKTyutUVjOrBZ52998MpjNAA7DN3e8rtk1Ko6enh66uLjKZDCMjI1RUVNDRkX28dFtbW8TRiUicnLPnYGYXAt8jewdQgrshVrj7SuASM1teTFu5PthslE6nyWQypFIp5s6dSyqVIpPJkE6now5NRGImn57DCPAZ4NlguoXsA7sBngeagRVFtO3P3ZiZrQHWANTW1tLf3z+NjzO7DQwMMDIyQn9/P4ODg/T39zMyMsLAwIB+jzOYcr54o/kupXPO4uDubwGYjT1AqQY4HLw+ClxZZNvE7XWTvU87TU1Nrqse81dfX09FRQUtLS1jV4z29fVRX1+vq0dnMOV88XSFdOkVMiA9CFQHrxcG6yimTUqkq6uLjo4O+vr6OH36NH19fXR0dNDV1RV1aCISM4XcW2kP2cNBu4DLyT7T9Z+KaJMSGR107uzsZGBggPr6etLptAajRWTaCikOvcDLZraE7CP8riL70PtC26SE2traaGtrUzdbRIqS92Edd28J/n2L7KD0LiDl7seKaSvZJxERkZIp6Jbd7v4mZ846KrpNRERmFg0Ii4hIiIqDiIiEqDiIiEiIioOIiISoOIiISIiKg4iIhKg4iIhIiIqDiIiEqDiIiEiIioOIiISoOIiISIiKg4iIhKg4iIhIiIqDiIiEqDiIiEiIioOIiISoOIiISIiKg4iIhKg4iIhIiIqDiIiEqDiIiEiIioOIiISoOIiISIiKg4iIhKg4iIhIiIqDiIiEqDiIiEiIioOIiISoOIiISIiKg4iIhKg4iIhIyLSLg5ldaGbPmdluM/tO0JYxs1fM7J6c5fJqExGRmaeQnsOtwBPu3gS8y8z+A1Dh7iuBS8xsuZndmE9byT6FiIiU1NwC3vNToNHM3g38CnAMeDKY9zzQDKzIs23/xJWb2RpgDUBtbS39/f0FhCiDg4P63cWEcr54yvfSK6Q47AQ+AXwBGADmAYeDeUeBK4GaPNtC3L0b6AZoamrylpaWAkKU/v5+9LuLB+V88ZTvpVfIYaV7gbXu/jXgNWA1UB3MWxisczDPNhERmYEK+QN9IXCpmVUAHwEeIHuICOBy4CCwJ882ERGZgQo5rHQ/8BhwMfAK8A3gZTNbArQCVwGeZ5uIiMxA0+45uPv/cPcPuvtCd/+Yu78FtAC7gJS7H8u3rVQfQkRESquQnkOIu7/JmTORptUmIiIzjwaFRUQkRMVBRERCVBxERCRExUFEREJUHEREJETFQUREQlQcREQkRMVBRERCVBxERCRExUFEREJUHEREJETFQUREQlQcREQkRMVBRERCVBxERCRExUFEREJUHEREJETFQUREQlQcREQkRMVBRERCVBxERCRExUFEREJUHEREJETFQUREQlQcREQkRMVBRERCVBxERCRExUFEREJUHEREJETFQUREQooqDmb2bTO7NnidMbNXzOyenPl5tYmIyMxScHEws98E3ufuW83sRqDC3VcCl5jZ8nzbSvIpRESkpOYW8iYzqwQeBZ4zs08CLcCTwezngWZgRZ5t+yesew2wBqC2tpb+/v5CQpz1BgcH9buLCeV88ZTvpVdQcQD+DbAP+E9AJ/B5IBPMOwpcCdQAh/NoG8fdu4FugKamJm9paSkwxNmtv78f/e7iQTlfPOV76RVaHFYA3e7+hpn9GfAbQHUwbyHZw1WDebaJiMgMU+gf538ALgleNwF1ZA8RAVwOHAT25NkmIiIzTKE9hwyw2cxuBirJjjn8pZktAVqBqwAHXs6jTUREZpiCeg7u/nN3/5S7X+PuK939ENkCsQtIufsxd38rn7ZSfAgRESmtQnsOIe7+JmfORJpWm4iIzCwaEBYRkRAVBxERCVFxEBGREBUHEREJUXEQEZEQFQcREQlRcRARkRAVBxERCVFxEBGREBUHEREJUXFImJ6eHhobG1m1ahWNjY309PREHZKIxFDJ7q0k0evp6aGrq4tMJsPIyAgVFRV0dHQA0NbWFnF0IhIn6jkkSDqdJpPJkEqlmDt3LqlUikwmQzqdjjo0EYkZFYcEGRgYoLm5eVxbc3MzAwMDEUUkInGl4pAg9fX17Ny5c1zbzp07qa+vjygiEYkrFYcE6erqoqOjg76+Pk6fPk1fXx8dHR10dXVFHZqIxIwGpBNkdNC5s7OTgYEB6uvrSafTGowWkWlTcUiYtrY22tra6O/vp6WlJepwRCSmdFhJRERCVBxERCRExUFEREJUHEREJETFQUREQlQcREQkRMVBRERCVBxERCRExUFEREJUHEREJETFQUREQlQcREQkRMVBRERCCi4OZlZrZv8reJ0xs1fM7J6c+Xm1iYjIzFNMz+HrQLWZ3QhUuPtK4BIzW55vW/Hhi4hIORT0PAcz+yhwHHgDaAGeDGY9DzQDK/Js2z/JutcAawBqa2vp7+8vJMRZb3BwUL+7mFDOF0/5XnrTLg5mNg/4CnAD0AvUAIeD2UeBK6fRFuLu3UA3QFNTk+uBNYXRw37iQzlfPOV76RVyWOku4Nvu/rNgehCoDl4vDNaZb5uIiMxAhfyB/i3g82bWD1wBXEv2EBHA5cBBYE+ebSIiMgNN+7CSu18z+jooENcBL5vZEqAVuArwPNtERGQGKurQjru3uPtbZAeldwEpdz+Wb1sx25bJ9fT00NjYyKpVq2hsbKSnpyfqkEQkhgo6W2kid3+TM2ciTatNSqenp4euri4ymQwjIyNUVFTQ0dEBQFtbW8TRiUicaFA4QdLpNJlMhlQqxdy5c0mlUmQyGdLpdNShiZSFesrlU5Keg8wMAwMDNDc3j2trbm5mYGAgoohEykc95fJSzyFB6uvr2blz57i2nTt3Ul9fH1FEIuWTTqdZvXo1nZ2dfPzjH6ezs5PVq1erp1wi6jkkSFdXFx0dHWN7Un19fXR0dOjLIom0b98+Tpw4Eeo5HDx4MOrQEkHFIUFGu9KdnZ0MDAxQX19POp1WF1sSad68edxxxx2kUqmxK6TvuOMO7r777qhDSwQVh4Rpa2ujra1NtxOQxBseHmbjxo2sWLFirKe8ceNGhoeHow4tEVQcRCSWGhoauP7668f1lG+55RZ6e3ujDi0RVBxEJJa6uromPVtJY2yloeIgIrGkMbbyUnEQkdjSGFv56DoHEREJUXEQEZEQFQcREQlRcRARkRAVBxERCVFxEBGREBWHhNH97WU2Ub6Xj65zSBDd315mE+V7eannkCB6EpzMJsr38lJxSBA9CU5mE+V7eak4JIieBCezSX19PevXrx835rB+/Xrle4moOCTI6JPg+vr6OH369NiT4Lq6uqIOTaTkUqkUGzZsoL29nW3bttHe3s6GDRtIpVJRh5YIGpBOEN2lUmaTvr4+1q1bx+bNm8fyfd26dXqeQ4mYu0cdw5Sampp89+7dUYcRS7pLZWHMbI+7N0W1feV8/ioqKjh58iSVlZVj+X7q1Cnmz5/PyMhI1OHFxlQ5r8NKIhJLGmMrLxUHEYkljbGVl8YcRCSWNMZWXuo5iIhIiHoOIhJLun1GeannICKxpNtnlFdBxcHMLjCz7Wb2vJk9Y2bzzCxjZq+Y2T05y+XVJiIyXbp9RnkV2nO4BXjY3X8beAO4Gahw95XAJWa23MxuzKetFB9CRGYfncpaXgWNObj7t3Mm3wv8PvDNYPp5oBlYATyZR9v+3HWb2RpgDUBtbS39/f2FhDjrDQ4O6ncXE8r5wtxwww3ccsstfOlLX2Lp0qV84xvf4MEHH6Sjo0O/wxIoakDazFYCFwIHgcNB81HgSqAmz7Zx3L0b6Ibs1aK6yrcwukI6PpTzhWlpaaGhoYF0Oj12KutDDz2kwegSKXhA2szeA2wE2oFBoDqYtTBYb75tIiIFaWtrY+/evbzwwgvs3btXhaGECh2Qngc8BXzZ3Q8Be8geIgK4nGxPIt82ERGZYQo9rNRB9pBQl5l1AY8Bt5rZEqAVuApw4OU82kREZIYpqOfg7o+4+4Xu3hL8fA9oAXYBKXc/5u5v5dNWig8hIrNTT0/PuIf99PT0RB1SYpTsCml3f5MzZyJNq01EZLp0hXR5aUA4YbQnJbOFrpAuL91bKUG0JyWzia6QLi/1HBJEe1Iym+gK6fJScUgQ7UnJbKKH/ZSXDislSH19PZ/+9KfZvn07Q0NDVFVV0draqj0pSSQ97Ke81HNIkIsuuoje3l7a29vZunUr7e3t9Pb2ctFFF0UdmkhZ6Arp8lHPIUFeeuklrr76ajZv3swjjzxCVVUVV199NS+99FLUoYlIzKg4JMjQ0BCHDx9m+/btY2crtbe3MzQ0FHVoIhIzOqyUIGZGa2vruLOVWltbMbOoQxORmFHPIWG6u7tZtmwZDQ0NPPzww3R3d0cdkojEkIpDgjQ0NFBdXc2dd96Ju2NmfOhDH+Ltt9+OOjSRsujs7OTRRx8dOzvvtttuY+PGjVGHlQgqDgmSSqXYtGkTX//612loaGDfvn2sW7eOtWvXRh2aSMl1dnayadMmNmzYMC7fARWIEjB3jzqGKTU1Nfnu3bujDiM2Ghsbuf766+nt7R0773t0eu/evVGHFwtmtsfdm6LavnI+f/Pnz+emm27i1VdfHcv3K664gqeffpqTJ09GHV5sTJXzKg4JUlFRwcmTJ6msrBx7TOipU6eYP38+IyMjUYcXCyoO8WFm1NXVsXnz5nFn5x08eJCZ/Hdtppkq53VYKUF0hbTMJmbGsmXLxl0hvWzZMg4dOhR1aImgU1kTRFdIy2zi7uzYsYNrrrmGZ599lmuuuYYdO3ao11AiOqyUIDoGWzwdVoqP+fPn09TUxO7du8d6yqPTyvf86bDSLDA0NER3dzcLFiwYG3M4ceIETzzxRNShiZTc8PDwpHcEGB4ejjq0RFBxSJCqqipWrFjB/v37x65zWL58OVVVVVGHJlJyDQ0NLF++nNbW1nFjbDU1NVGHlggac0iQxYsX8/rrr7Ny5UqeeuopVq5cyeuvv87ixYujDk2k5FKpFL29vWP3DhsaGqK3t5dUKhVxZMmgMYcEmTNnDgsWLOD48eNjbTU1NZw4cYJ33nknwsjiQ2MO8TF//vxJbypZVVWlMYdpmCrn1XNIEHfn+PHj3H777WzdupXbb7+d48eP6+wNSaTRwnDdddfxzDPPcN11141rl+Ko55AgZkZVVdW4L8fo9Ez+f55J1HOIDzNj6dKlLFiwYOzsvBMnTnDgwAHl+zSo5zBLDA0NUVdXx+OPP05dXZ32oiTRDhw4QHt7O9u2baO9vZ0DBw5EHVJi6GylhKmqquLQoUPceuutk/YkRJLm3nvvZXBwkIULF0YdSqKo55AwQ0NDrF27lq1bt7J27VoVBkm8wcHBcf9KaajnkDCLFi1i06ZNPPLII5gZixYt4siRI1GHJVIS+T7VMHc5jT8URj2HmDOzsR+AI0eOjH0Z3H2sMExcTiSO3H3sZ8uWLSxdupQXX3yR99/Zy4svvsjSpUvZsmXLuOWkMOo5xNzE5L/sssv44Q9/ODZ96aWX8oMf/OB8hyVSdm1tbUD2oT//uG+Azu31pNPpsXYpjk5ljYnL1z/PsbdPlW39F1RX8vf3/nbZ1h8XOpV1ZlC+nz+68V7MHXv7FAcf+ETey4/eeC9fdXdtKyAqkfJ4p+6LvKuc6wfgh+dYanaLpDiYWQZoALa5+31RxBA376q/i0u/d9f03vS96awfIP/iI1JOPx94QDtDETvvxcHMbgQq3H2lmW02s+Xuvv98xxE3Px94YNL2Qxt+b9rrunjdX4XaLqiunPZ6RMppsj/gyvfz57yPOZjZt4C/dvfnzOxmoNrdH8uZvwZYA1BbW/uh73//++c1vqTQRUGFSaVS533MQTlfPOV74abK+SgOK9UAh4PXR4Erc2e6ezfQDdnBuel0FeWM6XazJTrK+eIp30sviuscBoHq4PXCiGIQEZGziOIP8x6gOXh9OXAwghhEROQsojis1Au8bGZLgFbgqghiEBGRszjvPQd3fwtoAXYBKXc/dr5jEBGRs4vkOgd3fxN4Mopti4jIuWkwWEREQmb0vZXM7CfAoajjiKlFgO7VPX0Xu/t7o9q4cr5gyvfCTZrzM7o4SOHMbHeUN5ATOZ+U76Wnw0oiIhKi4iAiIiEqDsnVHXUAIueR8r3ENOYgIiIh6jmIiEiIioOIiISoOBTJzM75OzSz+TmvK82sqCeN5LPNCctXm1lFMds8y7p/wcwuKce6ZWZSzs+OnFdxKN6LZvYBADP7XTP75iTL9JrZvzazOuAPgM1mVmdmv2pmZ72FiZldbGYPTWi+3cy+MI3k/0qw3cnWv8XM/ruZ7Zjw8//M7PJgmYvM7Ekz+wsze87M/tbMfmBm/wz8X+DbOeu738yW5xmXxJNyfhbkfCT3VkoKM7uM7O/wH4KmE8DJYJ4BBiwFhoAq4FPAh4PXNwXv/VPg51OsfwHwMHDbhFm3AvcDr5rZCDAP+ADwQXd/bZJVnSb7HI3JnALaJ77PzL4LDAeTPwY2B+s4AXwSeAP4jru/M2F9D5D9Q/AHwU0WJUGU87Mn51UcinM/8ALwjJm9G/gF4D1mdhVQSXbv5XPAALAD+DLwy8A7wAXAH7v7pF+SwOeBh9z96GiDmV0LLHT3Z4Fng7YHge4pviSQfajSu6aY52QT+8SE9gayX1KCL8Nf58Tw68DJSb4kuPsxM/sa8EfA+rN8Nokn5fzElSU051UcCmRmtwC/CvxPd78uaGsBfsfd7wqmP0X2gUYH3P0dM6shuwcE8AngwnNs5jJ3fzBnmzVAmjN7N5jZR4Bfc/cvnWU9DWS/xI9OMb/d3V8zs+uB97v7tyb5vC8AC8juES4B3jGzfwfMB37m7r8zuqy7/72Z3XmOzyYxo5yfXTmv4lC4N4EvAh8OutrfAt7Nmb2oHuBl4N+TfX4FZB+Puix4vTiPbZyeMH0z2Yt9fh/GBukeAtqmWoGZXRDE5Wa22N3/ZZLFHjWz42T37Bab2e+SHY/6R3f/bLDMSeDfuvs/mdlasntR3w2OKf9JHrFL/CnnZ1HOqzgUyN2fM7PRx50uBHbl7D01A63uvi84hjrql4DRxHsf8Dfn2MxpM7sg54FI/wUYIfiiAO3Adnf/0VnW8QXgceAw2WOj7ZMsc1uwFzUH6MvdI8qR99WSZlZN9tizJIhyfmpJzHmdrVQapyZpm2wv4gjZx6T2An+Xx3r/DLhrdMLdT0045rmG7N4bAGZ2adANH53+DbKPYt3k7r3AL5rZH062oWBPsALYZWY3BW25Ow9T5Uol2ePJub4I/Pk5PpvEm3J+vMTlvIpDcUZ/fxXAajPrN7N+YGPOPAPmBKfgHQN2Bj+vA5zt1Dx3f4nscc47JswyM3sf8OMJg3vrgV8PFriZ7Ol2N7n7SDD/VuAzZva4mdUGbReR/bJ9hewhgPuAPzSzNmBj0N0G+BFnjvtaEMMVwF8Bz+UE1gG8x923T/W5JNaU87Mk53VYqTjVZE+pqwS2TOhiXxssU0X2QSTPAT8Bvprz/g+T/T/4/lQbcPeuICEnbvdfgGEz6yPb/a0GjgL/zcz+FXA9sMrdf5qzrrfM7KPA3UFM/wzsAx5z91dHlzOzT5L94tQALwbv/VzO9iuzTf6qmX3Q3XP3GPe4e2aqzyOxp5yfJTmvG++VgGWv/qx094mnxokkknI++VQcREQkRGMOIiISouIgIiIhKg4iIhKi4iAiIiH/H60G8UdLYSSlAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 2 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "%matplotlib inline\n",
    "import matplotlib as mpl  \n",
    "mpl.rcParams['font.sans-serif']=['SimHei'] #用来正常显示中文标签  \n",
    "mpl.rcParams['axes.unicode_minus']=False #用来正常显示负号 \n",
    "\n",
    "df[df.国家.isin([\"中国\",\"美国\"])][['国家',\"估值（亿人民币）\"]].groupby ( by = '国家' ).boxplot()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "中国         AxesSubplot(0.1,0.15;0.363636x0.75)\n",
       "美国    AxesSubplot(0.536364,0.15;0.363636x0.75)\n",
       "dtype: object"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYwAAAEGCAYAAAB2EqL0AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAa0UlEQVR4nO3dfZBU133m8e/DoBkoiFlYk/HKa4uSi/UyeqGMJhtRhWt7IjFEmGwp0pYk9FKuLWwEstgXKlUCDbEdRZKFUqGcImEwhJIlrUTK2l1psRDaQRJdRhvJ1pBYggUn+1KgrFZUbEPAyNbo7bd/9EXTNN0zp5uG7p55PlW36Hv6nHPPvRz49bnnvigiMDMzG82ERjfAzMxagwOGmZklccAwM7MkDhhmZpbEAcPMzJI4YIwxkh6R9B8a3Q6z80nSb0r6zaL1ayRd18g2jQcTG90Aq7sPsuUskv4K+BAYqlC2AyAifuP8NM3s3EiaCAg4BnxH0jXZV38IrJQkYEJEfJjld5+vIweMsed9KgSM7LubgbeB96LoJhxJHcA/AZ487y00q90S4F7gvWz9J8Ak4O+AP6MQTLYC382+d5+vIweMFifpz4GFwE+zpM8CX5K0LFu/GNgWEb9P4R8PwBPAxZL+KfBL4GcU/tH9a+CjC9V2s2pFxDOSfgpMo9BXfwvoBLZTCBbvR8QLRUXc5+vIAaP1vQd8KyI2A0j6U+BA0fo3OfsfxFIKw/SvU/iF9j3gIgq/tsxagapMd5+vAweM1pfy66g0zx8AOeD0r61/C/wt8M16Nsys3iQNAu8Ap08tzaAwD3FJtj5B0h8BuYj4h6Ki7vN14IDR+iYAayV9JVv/LPA7ResXA/3FBSLiXvh49PEToAe4H2i7EA02q1VEdBevS7ob+HlEbB+lnPt8HThgtL6LGP2U1EVF+SWpPSLeK0rbBXwGOHphmmxWG0n3A4sojBICmA38StKdWZaJwE8i4itnFnOfrwcHjNZ3J8PD83LuK/osCkPy70k6fZnhtdmfHVneSueAzRouItYB6wAkfZ7CZPffAP85Iv5TmSLu83XkgNHiIqJ0fmICRTdklnx/EfB3la45lzQL9wlrcpImAyuAW7PlfwNPSPpd4NvAYNHls+7zdeQDNfZMAdorfDcReKrol1apDnz3vzUxSVuB64DHgYVFE9s3SboB2AJcIukLEXEE9/m6kl+gNH5kv8zeDf+lW4uS9OvAsYiodHMqkv5xRPw8++w+X0cOGGZmlsRDMTMzS+KAYWZmSRwwzMwsSctdJfXJT34yZs2a1ehmtKR33nmHKVOmNLoZLWffvn0/i4iZjdq++3xt3N9rV6nPt1zAmDVrFoODg41uRkvK5/PkcrlGN6PlSDrSyO27z9fG/b12lfq8T0mZmVkSBwwzM0vigGFmZkkcMMzMLIkDhpmZJXHAMDOzJA4YZmaWxAHDzMyStNyNezY6qfoXiPmpxdaq3N8vHI8wxqCIKLtccs+zFb8za1Xu7xeOA4aZmSVxwDAzsyQOGGZmlsQBw8zMkjhgmJlZEgcMMzNL4oBhZmZJHDDMzCyJA4aZmSVxwDAzsyQOGGZmlmTUgCFpmqRdkgYkPS2pXdI2Sa9IWleUr1PS3qL1lZLy2fJjSd+pUP9ESW8W5b2iPrtmZmb1lDLCuA3YEBG9wFHgFqAtIuYDl0qaLWk68Cgw5XShiOiPiFxE5IC9wNYK9V8JbD+dNyL2n8P+mJnZeTJqwIiITRGxO1udCdwOfC9bHwAWAB8CNwMnS8tL+jTQGRGDFTZxNbBE0o+ykYsfuW5m1oSS/3OWNB+YDhwG3sqSjwHzIuJklqdc0a8B/SNU/RpwbUS8LekxYDGwo2Tby4HlAJ2dneTz+dRmWwkfu9bgPl8fPm71lRQwJM0ANgI3AquBydlXUxlhlCJpAtAD9I1Q/RsRMZR9HgRml2aIiC3AFoDu7u7I5XIpzbZSz+/Ex641uM/Xgft73aVMercDTwFrI+IIsI/CaSiAuRRGHJV8EfhhjPzGksclzZXUBlwPvJ7ScDMzu7BSJr2XAfOAPkl5QMAdkjYANwE7Ryi7CPjB6RVJXZLuL8lzH/A48GPglYh4Ib35ZmZ2oYx6Sioi+imZg5C0A1gIPBwRJ4ry5krK3luyfhBYV5J2gMKVUmZm1sRquiIpIo4zfKWUmZmNA77T28zMkjhgmJlZEgcMMzNL4oBhZmZJHDDMzCyJA4aZmSVxwDAzsyQOGGZmlsQBw8zMkjhgmJlZEgcMMzNL4oBhZmZJHDDMzCyJA4aZmSVxwDAzsyQOGGZmliTlnd7TJO2SNCDpaUntkrZJekXSuqJ8nZL2Fq1PlPSmpHy2XDHCNs6qz8zMmkvKCOM2YENE9AJHgVuAtoiYD1wqabak6cCjwJSiclcC2yMily37y1Uu6YbS+s5lh8zM7PwYNWBExKaI2J2tzgRuZ/j1rAPAAuBD4GbgZFHRq4Elkn6UjSAqvQ42V6Y+MzNrMsnv9JY0H5gOHAbeypKPAfMi4mSWp7jIa8C1EfG2pMeAxcCOMlVPKa2vzLaXA8sBOjs7yefzqc22Ej52rcF9vj583OorKWBImgFsBG4EVgOTs6+mUnmU8kZEDGWfB4FKp5pOjVZfRGwBtgB0d3dHLpdLabaVen4nPnatwX2+Dtzf6y5l0rsdeApYGxFHgH0MnzaaS2HEUc7jkuZKagOuB16vkC+1PjMza6CUEcYyCqeJ+iT1AY8Ad0i6GLiOwlxFOfcBTwICdkTEC5K6gFsjovhqqGeAvQn1mZlZA40aMCKiH+gvTpO0A1gIPBwRJ4ry5oo+H6BwpVRxXQeBdSVpJyXlytVnZmbNI3nSu1hEHGf4yqZzVu/6zMys/nynt5mZJXHAMDOzJA4YZmaWxAHDzMySOGCYmVmSmq6SsuYw9w8GOPGr96sqM2vNzuS80yZfxOvf6K22WWY2RjlgtLATv3qfww99KTl/Pp+v6lEJ1QQXMxv7fErKzMySOGCYmVkSBwwzM0vigGFmZkkcMMzMLIkDhpmZJXHAMDOzJA4YZmaWxDfutbBfm7OGKx5dU12hR6upHyD9xkCz88lPNmg8B4wW9otDD/lObxs3/GSDxhv1lJSkaZJ2SRqQ9LSkdknbJL0iaV1Rvk5Je0cqV6H+iZLelJTPlivqs2tmZlZPKXMYtwEbIqIXOArcArRFxHzgUkmzJU2ncLJjygjlfrtC/VcC2yMily37a90ZMzM7f0YNGBGxKSJ2Z6szgdsZfv/2ALAA+BC4GTg5Qrm/r7CJq4Elkn6UjVx8mszMrAkl/+csaT4wHTgMvJUlHwPmRcTJLE/FchHxaoWqXwOujYi3JT0GLAZ2lNSxHFgO0NnZST6fT232mFfNsTh16lTVx87HujHc58tzf2+spIAhaQawEbgRWA1Mzr6aygijlJJylbwREUPZ50FgdmmGiNgCbAHo7u6OaiayxrTnd1Y1qVftJGC19Vv9uM+X4f7ecCmT3u3AU8DaiDgC7KNwGgpgLoURR0q5Sh6XNFdSG3A98Hp6883M7EJJmfReBswD+iTlAQF3SNoA3ARUuhbtjHKSbpbUJen+knz3AY8DPwZeiYgXatgPMzM7z0Y9JRUR/UB/cZqkHcBC4OGIOFGUNzdSucy64pWIOEDhSikzM2tiNV2RFBHHGb5SyszMxgE/S8rMzJI4YJiZWRIHDDMzS+KAYWZmSRwwzMwsiQOGmZklccAwM7MkDhhmZpbEAcPMzJI4YJiZWRK/rMjMWsKvzVnDFY+uqa7Qo9XUD5D+zvDxyAHDzFrCLw49xOGH0v9Dr/Z9GLPWVHrwtp3mU1JmZpbEAcPMzJI4YJiZWRIHDDMzS+KAYWZmSUYNGJKmSdolaUDS05LaJW2T9IqkdUX5OiXtLSl7Vr4K20jKZ2ZmjZMywrgN2BARvcBR4BagLSLmA5dKmi1pOoUrnqecLiTphtJ85SpPzWdmZo01asCIiE0RsTtbnQnczvD7vAeABcCHwM3AyaKiuTL5yknNZ2ZmDZR8456k+cB04DDwVpZ8DJgXESezPMVFppTmq1D1qPkkLQeWA3R2dpLP51ObPeZVfbPR8+n5p1yEj3WDuM+XV81xOHXqVNXHzcd5ZEkBQ9IMYCNwI7AamJx9NZXKo5RT9coXEVuALQDd3d1Rzd2bY9nhXHX5Z63ZWdWdstY47vNlPL+zqju3q73Tu9r6x6OUSe924ClgbUQcAfYxfNpoLoURRzn1zmdmZg2UMsJYRuE0UZ+kPuAR4A5JFwPXAVdXKPcMsLc4n6Qu4NaIWDdSvtp2xczMzqeUSe/+iJgeEblseZTCRPWrQE9EnCjKmyv6fLI0X0QcLAkWZfOd816ZmVnd1fS02og4zvCVTRcsn5mZNY7v9DYzsyQOGGZmlsQBw8zMkjhgmJlZEgcMMzNL4oBhZmZJHDDMzCyJA4aZmSVxwDAzsyQOGGZmlsQBw8zMkjhgmJlZEgcMMzNL4oBhZmZJHDDMzCxJTe/DMDNrhFlrdlZX4Pn0/NMmX1Rla8YfBwwzawmHH/pSVflnrdlZdRkb2ainpCRNk7RL0oCkpyW1S9om6RVJ64rynZEmaaWkfLb8WNJ3KtQ/UdKbRXmvqN/umZlZvaTMYdwGbIiIXuAocAvQFhHzgUslzZZ0Q2la9i7wXPae773A1gr1XwlsL3pn+P5z3iszM6u7UQNGRGyKiN3Z6kzgdobfvz0ALAByZdIAkPRpoDMiBits4mpgiaQfZaMUnyYzM2tCyf85S5oPTAcOA29lyceAecCUMmmnfQ3oH6Hq14BrI+JtSY8Bi4EdJdteDiwH6OzsJJ/PpzbbSvjYtQb3+frwcauvpIAhaQawEbgRWA1Mzr6aSmGUcqpMGpImAD1A3wjVvxERQ9nnQWB2aYaI2AJsAeju7o5cLpfSbCv1/E587FqD+3wduL/XXcqkdzvwFLA2Io4A+xg+5TSXwoijXBrAF4EfRkSMsInHJc2V1AZcD7xe5T6YmdkFkDLCWEbhFFOfpD7gEeAOSRcD11GYgwhgb0kawCLgB6crktQF3BoR64rqvw94EhCwIyJeOLddMjOz82HUgBER/ZTMQUjaASwEHo6IE1larjQtIu4tqesgsK4k7QCFK6XMzKyJ1XRFUkQcZ/iqqIppZmY2dvhZUmZmlsQBw8zMkjhgmJlZEgcMMzNL4oBhZmZJHDDMzCyJA4aZmSVxwDAzsyQOGGZmlsQBw8zMkjhgmJlZEgcMMzNL4oBhZmZJHDDMzCyJA4aZmSVxwDAzsyQOGGZmlmTUgCFpmqRdkgYkPS2pXdI2Sa9IWleU74w0SRMlvSkpny1XjLCNs+ozM7PmkjLCuA3YEBG9wFHgFqAtIuYDl0qaLemG0jQK7+neHhG5bNlfrvIKZc3MrMmM+k7viNhUtDoTuB34drY+ACwAvsDw+7xPp00GlkjqAfYDd0bEB2U2kStT9n9WtRdmZnbejRowTpM0H5gOHAbeypKPAfOAKWXSXgSujYi3JT0GLAZ2lKm6XNnSbS8HlgN0dnaSz+dTm20lfOxag/t8ffi41VdSwJA0A9gI3AispjB6AJhK4bTWqTJpb0TEUJY2CFQ61VSu7BkiYguwBaC7uztyuVxKs63U8zvxsWsN7vN14P5edymT3u3AU8DaiDgC7KNw2ghgLoURR7m0xyXNldQGXA+8XmET5cqamVmTSRlhLKNwmqhPUh/wCHCHpIuB64CrgQD2lqS9ATwJCNgRES9I6gJujYjiq6GeKVPWzMyaTMqkdz/QX5wmaQewEHg4Ik5kabmStBMUrpQqrusgsK4k7WSZsmZm1mSSJ72LRcRxhq9sqph2LvWZmVlz8Z3eZmaWxAHDzMySOGCYmVkSBwwzM0tS06S3NTdJlb9bXz49Is5Ta8xsrPAIYwyKiLLLnj17Kn5nZjYaBwwzM0vigGFmZkkcMMzMLIkDhpmZJXHAMDOzJA4Y48D27du5/PLLueaaa7j88svZvn17o5tkZi3I92GMcdu3b6evr49t27bx4Ycf0tbWxrJlywBYunRpg1tnZq3EI4wx7oEHHmDbtm309PQwceJEenp62LZtGw888ECjm2ZmLcYBY4w7dOgQCxYsOCNtwYIFHDp0qEEtMrNW5YAxxs2ZM4eXX375jLSXX36ZOXPmNKhFZtaqHDDGuL6+PpYtW8aePXv44IMP2LNnD8uWLaOvr6/RTTOzFjPqpLekacBfAG3AO8DNFF7Z2gXsjIj7s3zbitPKlYuI98rUPxH4P9kCsCoi9p/rjlnB6YntVatWcejQIebMmcMDDzzgCW8zq1rKCOM2YENE9AJHgVuAtoiYD1wqabakG0rTypT77Qr1Xwlsj4hctjhY1NnSpUs5cOAAL774IgcOHHCwMLOajDrCiIhNRaszgduBb2frA8AC4AsMv5N7AFhQptzfV9jE1cASST3AfuDOiPggeQ/MzOyCSL4PQ9J8YDpwGHgrSz4GzAOmlEk7o1xEvFqh6teAayPibUmPAYuBHSXbXg4sB+js7CSfz6c224qcOnXKx65FuM/Xh49bfSUFDEkzgI3AjcBqYHL21VQKp7VOlUkrLVfJGxExlH0eBGaXZoiILcAWgO7u7sjlcinNthL5fB4fu9bgPl8Hz+90f6+zUecwJLUDTwFrI+IIsI/CaSiAuRRGHGellSlXyeOS5kpqA64HXq9lR6wyPxrEzOohZYSxjMIppj5JfcAjwB2SLgauozAHEcDekrTScv0U5ihujYh1RfXfBzwJCNgRES/UZc8M8KNBzKyOKr2yc6SFwlzGTcCnRko7H8tVV10Vlu6yyy6Ll156KSIi9uzZExERL730Ulx22WUNbFVrAQbjPPbp0Rb3+dpccs+zjW5Cy6rU52t6+GBEHGf4qqiKadZ4fjSImdWL7/Qe4/xoEDOrFweMMc6PBjGzevH7MMY4PxrEzOrFAWMcWLp0KUuXLvV9GGZ2TnxKyszMkjhgmJlZEgeMcWDVqlVMmjSJnp4eJk2axKpVqxrdJDNrQZ7DGONWrVrF5s2bWb9+PV1dXRw8eJB77rkHgI0bNza4dWbWSjzCGOO2bt3K+vXrWb16NZMmTWL16tWsX7+erVu3NrppZtZiHDDGuKGhIVasWHFG2ooVKxgaGqpQwqy1SCq7HFm/pOJ3VhsHjDGuo6ODzZs3n5G2efNmOjo6GtQis/oq98yjiGDPnj0jPQ/PauA5jDHuq1/96sdzFl1dXWzYsIF77rnnrFGHmdloHDDGuNMT2/feey9DQ0N0dHSwYsUKT3ibWdV8Smoc2LhxI++++y579uzh3XffdbAws5o4YJiZWRIHDDMzS5LyTu9pknZJGpD0tKR2SdskvSJpXVG+pLQK20jKZ7Xxnd42nixatIgJEybQ09PDhAkTWLRoUaObNGakjDBuAzZERC9wFLgFaIuI+cClkmZLuiElrVzlqfmsNqfv9H7wwQfZtWsXDz74IJs3b3bQsDFp0aJFDAwMsGLFCr7//e+zYsUKBgYGHDTqZNSAERGbImJ3tjoTuJ3hV7EOAAuAXGJaOan5rAa+09vGk927d7Ny5Uo2bdrE1KlT2bRpEytXrmT37t2jF7ZRJV9WK2k+MB04DLyVJR8D5gFTEtPKGTWfpOXAcoDOzk7y+Xxqs8e9oaEhurq6yOfznDp1inw+T1dXF0NDQz6OTcx9vjYRweLFi8/o74sXL6a/v9/HsA6SAoakGcBG4EZgNTA5+2oqhVHKqcS0ckbNFxFbgC0A3d3d4ZcApevo6ODgwYOsXr364xcobdiwgY6ODr9MqYm5z9dGEs899xybNm36uL/fddddSHJ/r4NRA4akduApYG1EHJG0j8Jpo1eBucDfAP83Ma2ccvVZnfhObxtPFi5cSH9/PwCLFy/mrrvuor+/n97e3ga3bIyo9KyVomeurASOA/ls+TLwOrABOARMAz6RmNYF3F9S/1n5RmrPVVddFVadu+++Ozo6OgKIjo6OuPvuuxvdpJYCDMYo/07O5+I+X53e3t6QFEBIit7e3kY3qeVU6vOKGh7EJWk6sBD4QUQcrSYttb5Kuru7Y3BwsOo2G36nd40k7YuI7kZt332+Nu7vtavU52t6llREHGf4yqaq0lLrMzOz5uI7vc3MLIkDhpmZJXHAMDOzJA4YZmaWpKarpBpJ0k+BI41uR4v6JPCzRjeiBV0SETMbtXH3+Zq5v9eubJ9vuYBhtZM02MjLQ80uJPf3+vMpKTMzS+KAYWZmSRwwxpctjW6A2QXk/l5nnsMwM7MkHmGYmVkSB4xxRlKHpM81uh1mF4r7fP04YLQYSX8iaZ6kRyX9ozLf/9fs6b+V3ErhZVijbedbkrolTZD0LyV9TtJXzqXtZrVwn28eDhitZ1L253eBL6vgIknK0r8HXHE6s6SLij7PBO4Hpkp6VtK+7M/nJO0pyjeJwkut/hqYT+E97m8CN53H/TKrxH2+SXjSu8lJ+i7wOeCdLOlzFN59/g9AB3ALsAaYA5T+ZQr4fxHxb7J/RDuBlyLioazuVyPi6jLb/HfAzIhYJ+k/AusjYr+kjcD2iPjLeu+n2Wnu882rpvdh2AX1EfDViPgJgKSvUXj97QFgbvbCqX8vqSciin8xTQS+ERG/nyV9msJrcD8l6dks7fOSnqPQD3ZExJ9K+gzwe8A2ST3ARxGxP8v/TeC/SFoSEb84nztt45r7fJNywGh+ATyRDb//CngWuBT4F8D/Ksq3VtL7EfFytv4bwD//uJKIw8DXJb0E9EbEB9mvrcUl2/si8EcUftV9C3hL0nEKr9H9DPA/KJwT/k59d9PsY+7zTcoBo/lNBm4AOrM/XwfuAtqBrxflW09hmL4kW/9XFM75AiCpLftY9hykpAkUTlE+KWkBMAP4LeA9Cr/EFktaS+Fdv7vrs2tmZbnPNylPeje/zwI/B6ZROI97GJgLPFicKSJeBC7JruwQhY7/34qyLKdwPvdXwDPZEP3z2QTgs9l3y0rq/CXwBQqnAqBw/viXdd07s7O5zzcpjzCamKRPAJ+gcPXGw8A3gD8H9gK/A/x6lmdzRHwE/CEwFegF8tkQ/PSvqH6gv6T+H0bEEs42AZggqZ3COdzfy9JnAifqu5dmw9znm5sDRnNbCTwG/HfgTuDPgD+OiCckPQH8CfBl4GVJH5UWlvQyhatGvgtsLVP/1Arb7aAw/P9j4MmIOJRduTId+Ntz2SGzUbjPNzFfVtvETl9PHhHvZ0PuT0XE20Xf/7OIOG+dWZLCHcQuIPf55uaAYWZmSTzpbWZmSRwwzMwsiQOGmZklccAwM7MkDhhmZpbk/wNYeKYcldHPjgAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 2 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "df[df.国家.isin([\"中国\",\"美国\"])][['国家',\"成立年份\"]].groupby ( by = '国家' ).boxplot()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>企业名称</th>\n",
       "      <th colspan=\"2\" halign=\"left\">估值（亿人民币）</th>\n",
       "      <th colspan=\"2\" halign=\"left\">成立年份</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>数量</th>\n",
       "      <th>总和</th>\n",
       "      <th>均值</th>\n",
       "      <th>最新</th>\n",
       "      <th>最早</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>行业</th>\n",
       "      <th>城市</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>金融科技</th>\n",
       "      <th>杭州</th>\n",
       "      <td>4</td>\n",
       "      <td>10290</td>\n",
       "      <td>2572.500000</td>\n",
       "      <td>2015</td>\n",
       "      <td>2009</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>媒体和娱乐</th>\n",
       "      <th>北京</th>\n",
       "      <td>7</td>\n",
       "      <td>6890</td>\n",
       "      <td>984.285714</td>\n",
       "      <td>2013</td>\n",
       "      <td>2003</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>共享经济</th>\n",
       "      <th>北京</th>\n",
       "      <td>5</td>\n",
       "      <td>4040</td>\n",
       "      <td>808.000000</td>\n",
       "      <td>2016</td>\n",
       "      <td>2011</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>云计算</th>\n",
       "      <th>纽约</th>\n",
       "      <td>4</td>\n",
       "      <td>3950</td>\n",
       "      <td>987.500000</td>\n",
       "      <td>2011</td>\n",
       "      <td>2002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>消费品</th>\n",
       "      <th>旧金山</th>\n",
       "      <td>2</td>\n",
       "      <td>3550</td>\n",
       "      <td>1775.000000</td>\n",
       "      <td>2017</td>\n",
       "      <td>2015</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>房地产科技</th>\n",
       "      <th>迈阿密</th>\n",
       "      <td>1</td>\n",
       "      <td>70</td>\n",
       "      <td>70.000000</td>\n",
       "      <td>2013</td>\n",
       "      <td>2013</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>新能源</th>\n",
       "      <th>坎贝尔</th>\n",
       "      <td>1</td>\n",
       "      <td>70</td>\n",
       "      <td>70.000000</td>\n",
       "      <td>2007</td>\n",
       "      <td>2007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>房地产科技</th>\n",
       "      <th>纽约</th>\n",
       "      <td>1</td>\n",
       "      <td>70</td>\n",
       "      <td>70.000000</td>\n",
       "      <td>2012</td>\n",
       "      <td>2012</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>共享经济</th>\n",
       "      <th>塔林</th>\n",
       "      <td>1</td>\n",
       "      <td>70</td>\n",
       "      <td>70.000000</td>\n",
       "      <td>2013</td>\n",
       "      <td>2013</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>新能源</th>\n",
       "      <th>布里斯托尔</th>\n",
       "      <td>1</td>\n",
       "      <td>70</td>\n",
       "      <td>70.000000</td>\n",
       "      <td>2009</td>\n",
       "      <td>2009</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>298 rows × 5 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "            企业名称 估值（亿人民币）               成立年份      \n",
       "              数量       总和           均值    最新    最早\n",
       "行业    城市                                          \n",
       "金融科技  杭州       4    10290  2572.500000  2015  2009\n",
       "媒体和娱乐 北京       7     6890   984.285714  2013  2003\n",
       "共享经济  北京       5     4040   808.000000  2016  2011\n",
       "云计算   纽约       4     3950   987.500000  2011  2002\n",
       "消费品   旧金山      2     3550  1775.000000  2017  2015\n",
       "...          ...      ...          ...   ...   ...\n",
       "房地产科技 迈阿密      1       70    70.000000  2013  2013\n",
       "新能源   坎贝尔      1       70    70.000000  2007  2007\n",
       "房地产科技 纽约       1       70    70.000000  2012  2012\n",
       "共享经济  塔林       1       70    70.000000  2013  2013\n",
       "新能源   布里斯托尔    1       70    70.000000  2009  2009\n",
       "\n",
       "[298 rows x 5 columns]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "先行再城 = df.groupby ( by = ['行业', '城市'] ) \\\n",
    "             .agg ({ \"企业名称\" : \"count\", \\\n",
    "                     \"估值（亿人民币）\":[\"sum\",\"mean\"], \\\n",
    "                     \"成立年份\":[\"max\",\"min\"],               }) \\\n",
    "             .sort_values ( by = [(\"估值（亿人民币）\",\"sum\")], ascending = False) \\\n",
    "             .rename ( columns = {\"sum\":\"总和\", \"mean\":\"均值\", \"count\":\"数量\", \"max\":\"最新\", \"min\":\"最早\"} )\n",
    "display(先行再城)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>企业名称</th>\n",
       "      <th colspan=\"2\" halign=\"left\">估值（亿人民币）</th>\n",
       "      <th colspan=\"2\" halign=\"left\">成立年份</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>数量</th>\n",
       "      <th>总和</th>\n",
       "      <th>均值</th>\n",
       "      <th>最新</th>\n",
       "      <th>最早</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>城市</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>北京</th>\n",
       "      <td>81</td>\n",
       "      <td>22130</td>\n",
       "      <td>273.209877</td>\n",
       "      <td>2019</td>\n",
       "      <td>2001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>旧金山</th>\n",
       "      <td>55</td>\n",
       "      <td>17060</td>\n",
       "      <td>310.181818</td>\n",
       "      <td>2017</td>\n",
       "      <td>2004</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>杭州</th>\n",
       "      <td>19</td>\n",
       "      <td>13290</td>\n",
       "      <td>699.473684</td>\n",
       "      <td>2015</td>\n",
       "      <td>2000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>上海</th>\n",
       "      <td>47</td>\n",
       "      <td>8990</td>\n",
       "      <td>191.276596</td>\n",
       "      <td>2017</td>\n",
       "      <td>2001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>纽约</th>\n",
       "      <td>25</td>\n",
       "      <td>8640</td>\n",
       "      <td>345.600000</td>\n",
       "      <td>2015</td>\n",
       "      <td>2002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>洛桑市</th>\n",
       "      <td>1</td>\n",
       "      <td>70</td>\n",
       "      <td>70.000000</td>\n",
       "      <td>2012</td>\n",
       "      <td>2012</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>盐湖城</th>\n",
       "      <td>1</td>\n",
       "      <td>70</td>\n",
       "      <td>70.000000</td>\n",
       "      <td>2008</td>\n",
       "      <td>2008</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>半月湾</th>\n",
       "      <td>1</td>\n",
       "      <td>70</td>\n",
       "      <td>70.000000</td>\n",
       "      <td>2014</td>\n",
       "      <td>2014</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>罗利</th>\n",
       "      <td>1</td>\n",
       "      <td>70</td>\n",
       "      <td>70.000000</td>\n",
       "      <td>2011</td>\n",
       "      <td>2011</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>马德里</th>\n",
       "      <td>1</td>\n",
       "      <td>70</td>\n",
       "      <td>70.000000</td>\n",
       "      <td>2011</td>\n",
       "      <td>2011</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>120 rows × 5 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "    企业名称 估值（亿人民币）              成立年份      \n",
       "      数量       总和          均值    最新    最早\n",
       "城市                                       \n",
       "北京    81    22130  273.209877  2019  2001\n",
       "旧金山   55    17060  310.181818  2017  2004\n",
       "杭州    19    13290  699.473684  2015  2000\n",
       "上海    47     8990  191.276596  2017  2001\n",
       "纽约    25     8640  345.600000  2015  2002\n",
       "..   ...      ...         ...   ...   ...\n",
       "洛桑市    1       70   70.000000  2012  2012\n",
       "盐湖城    1       70   70.000000  2008  2008\n",
       "半月湾    1       70   70.000000  2014  2014\n",
       "罗利     1       70   70.000000  2011  2011\n",
       "马德里    1       70   70.000000  2011  2011\n",
       "\n",
       "[120 rows x 5 columns]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "城市 = df.groupby ( by = ['城市'] ) \\\n",
    "             .agg ({ \"企业名称\" : \"count\", \\\n",
    "                     \"估值（亿人民币）\":[\"sum\",\"mean\"], \\\n",
    "                     \"成立年份\":[\"max\",\"min\"],               }) \\\n",
    "             .sort_values ( by = [(\"估值（亿人民币）\",\"sum\")], ascending = False) \\\n",
    "             .rename ( columns = {\"sum\":\"总和\", \"mean\":\"均值\", \"count\":\"数量\", \"max\":\"最新\", \"min\":\"最早\"} )\n",
    "display(城市)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['北京', '旧金山', '杭州'], dtype='object', name='城市')"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "城市.index[0:3]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "北京          AxesSubplot(0.1,0.559091;0.363636x0.340909)\n",
       "旧金山    AxesSubplot(0.536364,0.559091;0.363636x0.340909)\n",
       "杭州              AxesSubplot(0.1,0.15;0.363636x0.340909)\n",
       "dtype: object"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEGCAYAAABo25JHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAaR0lEQVR4nO3df3Bd5X3n8fdHsi07IoCpjSkU8IYSoqwJU+IwmIipFK/Nj5DMNuym2CTDdlW8Dq232TQUO5cCmUYx0Dq7qRvbOFEDnYC6sE1iYn6ZgXsnqMHM2mkZDMo2dNYkIWEDMQFMiH/I3/3jHMG1kKx77Sufe3Q+rxmN7j3nOec+1350P/d5nvNDEYGZmRVXS9YVMDOzbDkIzMwKzkFgZlZwDgIzs4JzEJiZFZyDYBKSNCvrOphNBLftieEgmJz+l6SPZF0Js0aSNBUoSyrVsc1CSYsnsFqTwpSsK2Djk/T/gN8C5gOliLisat2/AW4A9gLDJ4W0An8t6cNVu5kC/FVE/ODo1NpsdJI2Av83IlZL+gLwH0Yp9s8RccWIZV8BvgN8SNJgRHyzhpc7D5gKbDmiSk9yDoJ82BcR+yTtA/aPWPci0Afs4a0gGM0U4GcTVD+zeuwF9qWPZwF/HxE3Da+UdDFQqnreAvx3oCUiPifpWOBeSe8DvhgRew/xWvtIvhjZIXhoqElJmjbO+g9KegSYGhEDwOnA18b4uTAitkbEKxNcbbN6HTjUcklzgYdIesQ3SDoJeAdwFXA+8K+SPi/p5LT8o5IWjfViki6UdH/jqj85uEfQvP4hbdwBnFi1/GRJfw+cAFwfES+ny48FKhHx6eqdSPpj4DePRoXNDsNYvdiQdA5wP0nv4HKSLzVnAq8DPyH5pn8p8F9J2v9P021H9ppHescR1nnScRA0qYh4c7JX0k+qVh0P/HVEfG/EJofq3fmCUtbMlkv691XP3wn8OCKelPTbEfEGcDtAOqfwbETcXlX+6qrHBzh0e/ffwigcBPnzzCghAEkQfFxSV/q4heSb0fHAXUevemZ1CWDDKHMEKwEi4g1J2wGRfMifDOxJe7qtwIyIeM9Rr/Uk4yCYJCJio6QZwB3AvwMuBq4BLo6IezOtnNnYVEOZi4HFEXFndY9A0p8Cmya2esXgyeImJalV0rmSrgVmjlHmNEmt6ePPA0s4uOs7G1gt6SvpMdhmzWasIKhePgVYI+nUN1dK7cC1JEfLMaKs1cn/aE1I0nuBx4AB4AHg1+mq/cBpkqaT/AGsAXZI+m2SoyoWRcRr6Yd+RMTzkhYAdwObJV0cvgGFNZcWRp8jGJ74JSJ+Juk24D9VlfnPwN9FxI9H7G8q8DVJu9Pns4EWScPnKrQDLzSw/pOCg6AJRcQz6STZywCSetPDSZ8BngUeJ/nG9GOS46t/D/ifEfFrSX8FXElykhkR8Wp6lvEFDgFrEtOBtvTxFEafI7gxfXwb0Jmu2kcy57UPOAZ4WdIlJEcBvTsihiLig4d6YUmdJF+grIr82TC5pMdZ74uIX2RdF7PRSGoDDkTEvnEL21HhIDAzKzhPFpuZFZyDwMys4JpmsnjWrFkxd+7crKuRS6+//jrt7e1ZVyN3tm/f/lJEzM7q9d3mD4/b++Ebq803TRDMnTuXbdu2ZV2NXKpUKnR1dWVdjdyR9FyWr+82f3jc3g/fWG3eQ0M51t/fz7x581i4cCHz5s2jv78/6yqZWQ41TY/A6tPf30+pVKKvr4+hoSFaW1vp6ekBYMmSJRnXzszyxD2CnOrt7aWvr4/u7m6mTJlCd3c3fX199Pb2Zl01M8sZB0FODQ4O0tnZedCyzs5OBgcHM6qRmeWVgyCnOjo6GBgYOGjZwMAAHR0dGdXIzPLKQZBTpVKJnp4eyuUy+/fvp1wu09PTQ6lUGn9jM7MqnizOqeEJ4RUrVjA4OEhHRwe9vb2eKDazujkIckQa+x4eTz/9NEuXLmXp0qUHLfe1pMxsPB4aypGIGPXn9Os2j7nOzGw8DgIzs4JzEJiZFZyDwMys4DxZbGZN6VAHR4zF82KHxz0CM2tKPjji6HEQmJkVnIPAzKzgHARmZgXnIDAzKzgHgZlZwTkIzMwKzkFgZlZwDgIzs4JzEJiZFZyDwMys4BwEZmYF5yAwMys4X320CZ3z+S288sa+uraZu/K+msseN2MqT964uN5qmdkk5SBoQq+8sY+dN3+45vKVSoWurq6ay9cTGmY2+XloyMys4BwEZmYF5yAwMys4zxGYWaZ8cET2HARmlikfHJE9Dw2ZmRWcg8DMrOAcBGZmBec5gib0zo6VnH3Hyvo2uqOe/QPUPiZrNpHc3rPnIGhCrw3e7MkzKwy39+x5aMjMrOAcBGZmBeehoSZVd3f2wfpOsDEzG+YgaEL1jJdCEhr1bmNmNmzcoSFJx0l6QNIWSd+SNE1Sn6THJV1fVW6OpMdGbPu2cmZm1lxqmSO4EvhSRCwGXgCuAFojYgHwLklnSppJckBX+/BGkj42slzjq29mZkdq3CCIiHUR8XD6dDbwCeDu9PkWoBMYAn4feLVq065RypmZWZOpeY5A0gJgJrATeD5dvAs4NyJeTctUb9I+stwo+1wGLAOYM2cOlUqlrsrbW/xvlw9u86ObyIMj2qf672M8NQWBpBOAtcDlwGeAGemqYxi7V7F7vHIRsRHYCDB//vyo5yQRq/LgfXWdYGPZcZt/u51d9ZX3wRGNV8tk8TTgHmBVRDwHbOetYZ5zSHoIo6m1nJmZZaiWHkEPybBOSVIJ+DrwSUknA5cA54+x3beBx2ooZ2ZmGaplsnh9RMyMiK705w6SieCtQHdEvFJVtqvq8atjlTMzs+ZxWCeURcTLvHVE0BGXMzOz7PhaQ2ZmBecgMDMrOAeBmVnBOQjMzArOQWBmVnAOAjOzgnMQmJkVnIPAzKzgfIeyHBlxddeD190y+vKImKDamE0st/ejxz2CHImIUX/K5fKY68zyyu396HEQ5NhFF11ES0sL3d3dtLS0cNFFF2VdJTPLIQdBTl100UVs2bKF5cuX853vfIfly5ezZcsWh4GZ1c1zBDn18MMP86lPfYp169ZRqVRYt24dABs2bMi4ZmaWN+4R5FREsHr16oOWrV692uOkZlY3B0FOSWLVqlUHLVu1atUhj7Qwy7MVK1Ywffp0uru7mT59OitWrMi6SpOGh4ZyatGiRaxfvx6ASy+9lGuuuYb169ezePHijGtm1ngrVqxgw4YN3HLLLbz3ve/lmWee4brrrgNg7dq1GdduEhjrMKyj/fP+978/rD6LFy8OSQGEpFi8eHHWVcoVYFu4zedCW1tbrFmzJiIiyuVyRESsWbMm2traMqxV/ozV5j00lGMPPfQQBw4coFwuc+DAAR566KGsq2Q2Ifbs2cPy5csPWrZ8+XL27NmTUY0mFweBmTW9tra2tx0Rt2HDBtra2jKq0eTiOQIza3pXX3011157Lbfeeis///nPOfHEE3nxxRe55pprsq7apOAegZk1vQsuuID29nZ27dpFRLBr1y7a29u54IILsq7apOAgMLOm19vby6ZNm9i7dy/lcpm9e/eyadMment7s67apOAgMLOmNzg4SGdn50HLOjs7GRwczKhGk4uDwMyaXkdHBwMDAwctGxgYoKOjI6MaTS4Oghzr7+9n3rx5LFy4kHnz5tHf3591lcwmRKlUoqenh3K5zP79+ymXy/T09FAqlbKu2qTgo4Zyqr+/n1KpRF9fH0NDQ7S2ttLT0wPAkiVLMq6dWWMNt+kVK1YwODhIR0cHvb29busN4h5BTvX29tLX10d3dzdTpkyhu7ubvr4+T57ZpLVkyRJ27NjBI488wo4dOxwCDeQgyClPnplZozgIcsqTZ2bWKA6CnPLkmZk1iieLc8qTZ2bWKA6CHFuyZAlLliyhUqnQ1dWVdXXMLKc8NGRmVnAOghzzCWVm1ggeGsopn1BmZo3iHkFO+YQyM2sUB0FO+YQyM2sUB0FO+YQyM2uUcYNA0nGSHpC0RdK3JE2T1CfpcUnXV5WraZk1hk8oM7NGqWWy+ErgSxHxsKT1wBVAa0QskPS3ks4Ezq5lWUT8cOLeSrH4hDIza5RxgyAi1lU9nQ18Avgf6fMtQCfwO8DdNSxzEDSQTygzs0ao+fBRSQuAmcBO4Pl08S7gXKC9xmUj97kMWAYwZ84cKpVKvfU3YPfu3f63ywm3+SPn9t54NQWBpBOAtcDlwGeAGemqY0jmGXbXuOwgEbER2Agwf/788Lfaw+MeQX64zR85t/fGq2WyeBpwD7AqIp4DtpMM8wCcQ9JDqHWZmZk1mVp6BD0kwzolSSXg68AnJZ0MXAKcDwTwWA3LzMysySgi6t9ImgksAr4bES/Us+wQ+3wReK7uyhjALOClrCuRQ6dHxOysXtxt/rC5vR++Udv8YQWBNRdJ2yJiftb1MDsa3N4bz2cWm5kVnIPAzKzgHAQ5I+mdoyzeeNQrYpYdt/cG8xxBE5O0CfhcRDydPv8N4B+B90XE3kwrZ2aThnsEze3XwJsf+BHxC+Ax4P0jCyrx46rnP5T0jvRxSdKfHYX6mlkO+Q5lTSg9k/sskkt6vC89DPerwL60yFpJAAci4rx02TRgqGo3e3grRPYA+ye63mZHk6Q24Lci4l+zrkveuUfQJCR9WdK5ku4A5gLdJFdw/RDwDuBHwKKImD/8Q3rmtqTLgD8FZkm6eLgnAJwp6ZQRr7Na0nxJLZJ+V9IZkv7w6LxLs7dUt3lJx4+yflP6JWgsS0kufTPe67jNj8NB0Dymp79vBy4E1pOcODMQERWgDDwuaVv6s7xqnuCPgf9G8v+5DngKOA24C3hi+AUkTScJj38CFpBcSfZHwMcn9J2Zja66zV+VDm9OVdrdJbl68dnDhSVNrXo8G/gCcIykzZK2p7/vl1SuKuc2XwNPFmdE0u3AGcDr6aIzSK7S+kugDfguUErXP5v+PhP4Z+AUYE9EzE//OH4J/CXJvSJ6gE3ACySXBtmePn+J5LIfsyPieknfAG6JiKckrQX6I+J7E/2+rbhqaPNXACuBDpK2etDmwE8j4g/SNn8f8GhE3Jzue2tEvO0yNpL+BLf5cXmOIDsHgKsj4gcAkv4I+AmwA/ggybf8R4AHgeOBbwLXAlcB/wA8nu7nXSQf8r9BEhCrSL5pnU76x0LyR3Q8SUj0SeommV94Kt3HTcA3JV0WEa9N3Fu2gjtUmz8nvQzNpyV1R0T1t/opwI0R8efpolOArcBJkjany86SdD/JZ9q9EfE3kk4FPovb/LgcBNkJ4M60G/x9YDPJh/p5wEkkN/9ZCvweyYf7ScCLwAeA33xzJxH/Bzhd0lbgiYi4LH18DHBJROyX9FngfSS9hjOA1cDzkl4GngROBZ5OX++2iX7jVliHavPPVpVbJWlfRAzflPsDwHve3EnETuAGSY8Ci9M2vjUiLh3xehfiNl8TB0F2ZgAfA+akv58EriE5+mdhRAxJWkpy8sxKkj+UJ4GPAgPw5kX9dpN805pN0ns4iKQWkh7B99OfE0gmoPeSfHO6VNIqYFtEPDxh79bs0G3+hqpyt5C0+cvS5x8lmUcAQFJr+nDUce3hNh8Rd0nqxG1+XJ4szs5pwC+A40jGSXeS3Lfhi1VlppBM9naQTG7dQ9Kg/zfJuQRfBL5BcivQU4DOtKt8FsmRR/eSDA+dV7VPIuJXJLcS3ZEuagN+1di3Z/Y2tbR5IuIRkl7uGWnv4UPAQ1VFlpG06zeAbw+3+XSyeHO6rmfEPt3mD8E9ggxIOhY4luRohluBG4GvkZws9hHgxLTMVJJDR4PkCIkPAhXgfpIP+gD+LF23NSJK6f6fSPd7b0Tsk3Q9ybeuFqAlvdnQTSTjp5D0Jl6ZyPdsxVZHm98QEQeAvyAZ3lwMVNLhn+Fv+utJjqqr3v8TEXEZb+c2XwMfNZQBSdeRnPx1GzAP+AqwJiLulHQncDHJpPDnSIZ9xOhHUdweEV9Nu8otEbEv3f/TEfFvR3ndRcBCkvtJb01f73aSE9f+oy9bYRPlMNr8qLshbfOj7N9t/gg4CDIwfDx0+m1dwEkR8bOq9e+OiH+ZwNdX+D/ejiK3+ebmIDAzKzhPFpuZFVzTTBbPmjUr5s6dm3U1cun111+nvb0962rkzvbt21/K8p7FZs2iaYJg7ty5bNu2Letq5FKlUqGrqyvrauSOJN843gwPDZmZFZ6DwMys4BwEZmYF5yAwMys4B4GZWcE5CMzMCs5BYGZWcA4CM7OCcxCYmRWcg8DMrOAcBGZmBecgMDMrOAeBmVnBOQjMzArOQWBmVnAOAjOzgnMQmJkVnIPAzKzgHARmZgXnIDAzKzgHgZlZwU3JugJWO0l1bxMRE1ATM5tM3CPIkYgY9ef06zaPuc7MbDwOAjOzgnMQmJkVnIPAzKzgPFnchM75/BZeeWNfXdvMXXlfzWWPmzGVJ29cXG+1zGySchA0oVfe2MfOmz9cc/lKpUJXV1fN5esJDTOb/Dw0ZGZWcA4CM7OCcxCYmRWc5wia0Ds7VnL2HSvr2+iOevYPUPschJlNbg6CJvTa4M2eLDazo8ZDQ2ZmBecgMDMrOAeBmVnBOQjMzArOQWBmVnAOAjOzgnMQmJkVnM8jaFJ1H+v/YH1XHzUzG+YgaEL1nEwGSWjUu42Z2bBxh4YkHSfpAUlbJH1L0jRJfZIel3R9Vbk5kh4bse3bypmZWXOpZY7gSuBLEbEYeAG4AmiNiAXAuySdKWkmydVu2oc3kvSxkeUaX30zMztS4wZBRKyLiIfTp7OBTwB3p8+3AJ3AEPD7wKtVm3aNUs7MzJpMzXMEkhYAM4GdwPPp4l3AuRHxalqmepP2keVG2ecyYBnAnDlzqFQqdVXe3uJ/OzM7XDUFgaQTgLXA5cBngBnpqmMYu1exe7xyEbER2Agwf/78qOcKmlblwfvquvqomVm1WiaLpwH3AKsi4jlgO28N85xD0kMYTa3lzMwsQ7X0CHpIhnVKkkrA14FPSjoZuAQ4f4ztvg08VkM5MzPLkCKi/o2So4QWAd+NiBeOtBwkQ0Pbtm2ruy5FMmIOpiaH8/9bFJK2R8T8rOthlrXDusRERLwcEXeP9+FeazmrTUSM+lMul8dcZ2Y2Hl9ryMys4BwEZmYF5yAwMys4B4GZWcE5CMzMCs5BYGZWcA4CM7OCcxCYmRWcg8DMrOAcBGZmBecgMDMrOAeBmVnBOQjMzArOQWBmVnAOAjOzgnMQmJkVnIPAzKzgHARmZgXnIDAzKzgHQY719/czb948Fi5cyLx58+jv78+6SmaWQ1OyroAdnv7+fkqlEn19fQwNDdHa2kpPTw8AS5Ysybh2ZpYn7hHkVG9vL319fXR3dzNlyhS6u7vp6+ujt7c366qZWc44CHJqcHCQzs7Og5Z1dnYyODiYUY3MLK8cBDnV0dHBwMDAQcsGBgbo6OjIqEZmllcOgpwqlUr09PRQLpfZv38/5XKZnp4eSqVS1lUzs5zxZHFODU8Ir1ixgsHBQTo6Oujt7fVEsZnVTRGRdR0AmD9/fmzbti3rauRSpVKhq6sr62rkjqTtETE/63qYZc1DQ2ZmBecgMDMrOAeBmVnBOQjMzArOQWBmVnAOAjOzgnMQmJkVnIPAzKzgHARmZgXnIDAzKzgHgZlZwY0bBJKOk/SApC2SviVpmqQ+SY9Lur6qXE3LrHF8q0oza4Rarj56JfCliHhY0nrgCqA1IhZI+ltJZwJn17IsIn44cW+lWHyrSjNrlHF7BBGxLiIeTp/OBj4B3J0+3wJ0Al01LrMG8a0qzaxRar4fgaQFwExgJ/B8ungXcC7QXuOykftcBiwDmDNnDpVKpd76F9bg4CBDQ0NUKhV2795NpVJhaGiIwcFB/zuaWV1qCgJJJwBrgcuBzwAz0lXHkPQqdte47CARsRHYCMn9CHxN/dp1dHTQ2tpKV1fXm/cjKJfLdHR0+N4EZlaXWiaLpwH3AKsi4jlgO28N85xD0kOodZk1iG9VaWaNUkuPoIdkWKckqQR8HfikpJOBS4DzgQAeq2GZNYhvVWlmjXJYt6qUNBNYBHw3Il6oZ9kh9vki8FzdlTGAWcBLWVcih06PiNlZV8Isa01zz2I7fJK2+d67Zna4fGaxmVnBOQjMzArOQTA5bMy6AmaWX54jMDMrOPcIzMwKzkEwSUhqk3RG1vUws/xxEDQJSV+WdK6kOyQdP8r6Tel5GWNZSnIZkPFeZ7Wk+ZJaJP2upDMk/eGR1N3M8s1B0Dymp79vB65SYqokpcvvJrm0NwCSplY9ng18AThG0mZJ29Pf90sqV5WbTnLZj38CFpBcSfZHwMcn8H2ZWZPzZHFGJN0OnAG8ni46g+Qqrb8E2kju+7AS6CC5XMdBmwM/jYg/SAPhPuDRiLg53ffWiHjbJT0k/QkwOyKul/QN4JaIeErSWqA/Ir7X6PdpZs2v5stQW8MdAK6OiB8ASPoj4CfADuCc9JIcn5bUHRHV3+qnADdGxJ+ni04BtgInSdqcLjtL0v0k/7/3RsTfSDoV+CzQJ6kbOBART6XlbwK+KemyiHhtIt+0mTUfB0F2ArgzHfr5PrAZeBdwHvBsVblVkvZFxED6/APAe97cScRO4AZJjwKLI2J/2iO4dMTrXQj8JUnPYzXwvKSXgSeBU4GnSeYZbmvs2zSzZucgyM4M4GPAnPT3k8A1wDTghqpyt5AMEV2WPv8oyTwCAJJa04ejjvFJaiEZArxLUidwAvAhYC9Jb+FSSauAbVV3ojOzAvFkcXZOA34BHEcyN7CT5L4NX6wuFBGPAKenR/eI5EP8oaoiy0jmCN4Avp0OD52VThZvTtf1jNjnr4DfIRmGgmRO4lcNfXdmlhvuEWRA0rHAsSRH8NwK3Ah8DXgM+AhwYlpmQ0QcAP6C5C5vi4FKOvwz/E1/PbB+xP6fiIjLeLsWoCW92dBNJHMGkNyL+pXGvkszywsHQTY+Bfwd8I/AfwG+AqyJiDsl3Ql8GbgKGJB0YOTGkgZIjhy6HfjqKPs/ZozXbSMZeloD3BURg+nRSzOBfzmSN2Rm+eXDRzMwfA5AROxLh3tOioifVa1/d0RM2AezJIX/480s5SAwMys4TxabmRWcg8DMrOAcBGZmBecgMDMrOAeBmVnB/X9E+RinF3S2+gAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 4 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "df[df.城市.isin(城市.index[0:3])][['城市',\"成立年份\"]].groupby ( by = '城市' ).boxplot()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0      中国\n",
      "1      中国\n",
      "2      中国\n",
      "3      美国\n",
      "4      美国\n",
      "       ..\n",
      "489    美国\n",
      "490    中国\n",
      "491    中国\n",
      "492    美国\n",
      "493    美国\n",
      "Name: 国家, Length: 494, dtype: object\n",
      "0      中国\n",
      "1      中国\n",
      "2      中国\n",
      "3      美国\n",
      "4      美国\n",
      "       ..\n",
      "489    美国\n",
      "490    中国\n",
      "491    中国\n",
      "492    美国\n",
      "493    美国\n",
      "Name: 国家, Length: 494, dtype: category\n",
      "Categories (24, object): [中国, 以色列, 卢森堡, 印度, ..., 西班牙, 阿根廷, 韩国, 马耳他]\n"
     ]
    }
   ],
   "source": [
    "# E2 Categorical 类型的数据\n",
    "print (df.国家)\n",
    "print (df.国家.astype('category'))\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 494 entries, 0 to 493\n",
      "Data columns (total 10 columns):\n",
      " #   Column        Non-Null Count  Dtype \n",
      "---  ------        --------------  ----- \n",
      " 0   排名            494 non-null    int64 \n",
      " 1   企业名称          494 non-null    object\n",
      " 2   Company Name  494 non-null    object\n",
      " 3   估值（亿人民币）      494 non-null    int64 \n",
      " 4   国家            494 non-null    object\n",
      " 5   城市            494 non-null    object\n",
      " 6   行业            494 non-null    object\n",
      " 7   掌门人/创始人       494 non-null    object\n",
      " 8   成立年份          494 non-null    int64 \n",
      " 9   部分投资机构        494 non-null    object\n",
      "dtypes: int64(3), object(7)\n",
      "memory usage: 38.7+ KB\n"
     ]
    }
   ],
   "source": [
    "df.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 494 entries, 0 to 493\n",
      "Data columns (total 10 columns):\n",
      " #   Column        Non-Null Count  Dtype   \n",
      "---  ------        --------------  -----   \n",
      " 0   排名            494 non-null    int64   \n",
      " 1   企业名称          494 non-null    object  \n",
      " 2   Company Name  494 non-null    object  \n",
      " 3   估值（亿人民币）      494 non-null    int64   \n",
      " 4   国家            494 non-null    category\n",
      " 5   城市            494 non-null    category\n",
      " 6   行业            494 non-null    category\n",
      " 7   掌门人/创始人       494 non-null    object  \n",
      " 8   成立年份          494 non-null    int64   \n",
      " 9   部分投资机构        494 non-null    object  \n",
      "dtypes: category(3), int64(3), object(4)\n",
      "memory usage: 36.2+ KB\n"
     ]
    }
   ],
   "source": [
    "df新 = df.copy()\n",
    "df新 = df新.assign(国家=df.国家.astype('category'))\n",
    "df新 = df新.assign(城市=df.城市.astype('category'))\n",
    "df新 = df新.assign(行业=df.行业.astype('category'))\n",
    "df新.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0      中国\n",
       "1      中国\n",
       "2      中国\n",
       "3      美国\n",
       "4      美国\n",
       "       ..\n",
       "489    美国\n",
       "490    中国\n",
       "491    中国\n",
       "492    美国\n",
       "493    美国\n",
       "Name: 国家, Length: 494, dtype: category\n",
       "Categories (24, object): [中国, 以色列, 卢森堡, 印度, ..., 西班牙, 阿根廷, 韩国, 马耳他]"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df新.国家"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![01_table_dataframe.svg](https://jakevdp.github.io/PythonDataScienceHandbook/figures/03.08-split-apply-combine.png)\n",
    "## Pandas 之Split-Apply-Combine初阶\n",
    "\n",
    "* 分分分：拆分数据成组。\n",
    "* 迸迸迸：迸行数据方法应用于每个组，产出数据结果。\n",
    "* 合合合：将分组数据结果合并成新的数据框，出报表或进行下一运算。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 今天开始\"数据感\"的里程碑\n",
    "\n",
    "![DataTypes](02_split-apply-comine_data-types.png#full)\n",
    "\n",
    "### 数据形态及标准为何重要？知识领域丶统计丶及信息管理的融合及拆解"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 本周我的总结"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {
    "height": "calc(100% - 180px)",
    "left": "10px",
    "top": "150px",
    "width": "281.390625px"
   },
   "toc_section_display": true,
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
